




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The purpose of this document is to teach you the basics of Intel 80x86 assembly language. You will learn in ECE 272 lecture that assembly language (or simply "assembly") is important because it is the principal link between the software world of high-level languages like C and Java and the
Typology: Lecture notes
1 / 138
This page cannot be seen from the preview
Don't miss anything!
(Undergraduate Laboratory Coordinator )
Updated on July 4, 2019
7 Appendix A Code Comments 129
8 Appendix B Useful Terminal Commands 131
9 Appendix C Working Remotely 133
10 Appendix D ASCII Code 136
Introduction
The purpose of ECE 273 is to teach you the basics of Intel 80x86 assembly language. You will learn in ECE 272 lecture that assembly language (or simply ”assembly”) is important because it is the principal link between the software world of high-level languages like C and Java and the hardware world of CPU design. Assembly language is the lowest-level, human-readable programming medium we can use to express complete application programs. Assembly language gives full access to the programmable features of the hardware, so a good understanding of it will provide valuable insight into the fundamentals of CPU design, the operation of the datapath, and program execution.
Since the creation of compilers and assemblers, assembly language programming as an art has virtually disappeared from the face of the Earth, so of what use is it to you? There are several major advan- tages to understanding assembly language. First, compilers translate high-level languages into assembly language, so compiler writers must understand assembly. Operating systems also include critical com- ponents written in assembly. Furthermore, embedded and mobile device programming often require knowledge of assembly language. As these technologies become more and more important to the overall performance and flexibility of computer systems, knowledge of the computer at the assembly-language level will prove to be a valuable asset. Even if you spend your entire career programming in high-level languages, a basic understanding of assembly language concepts will give you an insight into your work that will in turn make you more valuable as an electrical or computer engineer.
With these considerations in mind, ECE 273 will not strive to make you a proficient assembly language programmer. However, like most programming languages, you simply cannot grasp the key concepts by mere discussion. Therefore, you will, for a semester, become an assembly language programmer just like the hackers” of old. ECE 273 is a laboratory class, meaning we will provide an environment for you to gain hands-on experience with the tools and concepts used in the course. This approach also means that you will only get from it what you put into it. The more time you spend working on your program, the more you will learn from it, and the more you will understand about how and why assembly languages works the way it does.
The whole laboratory class provides a study of assembly language from the point of view of a high- level language, namely C. For example, C provides a control structure called the for loop, and we will (eventually) discuss how to implement a for loop in assembly language. As such, a good knowledge of C is necessary to fully understand and succeed in the laboratory assignments. The goal of the first lab is to introduce you to the tools you will need throughout this course. Most of what you will need to know to be successful in ECE 273 will come from a collection of documents available on the web at http://www.clemson.edu/ces/departments/ece/resources/ ECE273Lab.html. This site includes: background material on the Intel 80386, which outlines the features of the CPU and its respective assembly language; the GNU Debugger (GDB), although it is not required (but still encouraged) for this course; and information on navigating your way around the UNIX terminal. The Appendix of this manual also provides resources for successfully completing each lab.
For development and testing, any flavor of Linux should suffice, as they are all based on UNIX. If you are new to UNIX, you may need to seek out a more complete reference at the library or online and spend some time familiarizing yourself with the fundamentals. For each lab, you must be able to log onto a campus UNIX machine and edit files with any of the standard text editors (vi, gedit, nano, pico, etc). However, when working in any Mac Lab, you are encouraged to use Xcode on any of the iMacs. An IDE such as Xcode is an efficient way to work on your laboratory assignments. Please note that the iMacs in the 321 Riggs Lab or any other Mac Lab at Clemson are not directly compatible with the assembly code in ECE 273. This means your assembly programs cannot be tested on them; the
Student Objectives
The goal of Lab 1 is simply to introduce you to the basic tools and procedures you will use to write, assemble, link, execute, and debug your programs. The task is simple: create an assembly program and run it to demonstrate what it does. If you are already familiar with the UNIX operating system, this assignment will be trivial.
All of your assignments will consist of two program source files. One is a C program that sets up the assignment it is referred to as the driver. You must not alter this file in any way, or your assignment may not work properly when your lab instructor tests it. The second source file is an assembly language file that implements one or more functions called by the C program. Some of this file will be completed for you it is referred to as the assembly stub. Both files are provided in the lab manual following each labs introduction and discussion. They can be copied from the lab manual; however, typing them out manually will give your more experience with the structure of C and assembly programs. Note that code copied from a PDF version of the lab manual may not paste in the same order as it appears in the manual. Check your driver code carefully for copy-and-paste errors before asking for help.
Lab 1 requires no additional code, other than what is provided in the lab manual. As such, we only need to save, compile, and test it out! To do so from the Mac Lab on campus, we must first find a machine with the same ISA and system call conventions. Fortunately, the College of Engineering and Science (COES) has some Linux machines for us to use they are called the Apollo machines and there are 16 of them named apollo01.ces.clemson.edu, apollo02.ces.clemson.edu, , apollo16.ces.clemson.edu. Note the number of the machine is a two-digit number from 01 16, and any machine listed above will work for compiling and running your code. (Simply pick one that is currently online.)
First, we need to transfer our code (the C driver and assembly files) from the Macs, where they are developed, to the Apollo machines, where we will compile and test our program. If you have had some
experience in the terminal, you might be aware of the commands SCP and SFTP, which allow the user to securely transfer files from one machine to another. You are more than welcome to use these commands; however the Macs have a custom program for seamlessly accomplishing this task cesmount. What cesmount does is essentially establish a graphical SSH session using your Clemson username with an Apollo machine. This appears on the iMac Desktop as a removable disk drive named as your username. The advantage to using cesmount is the following: when working with your code in an IDE, such as Xcode, on the iMacs, you can save your code onto your drive (mounted by cesmount) as if it were a flash drive you had plugged in yourself. Now, when you login to the Apollo machines to test your code, the same directory in your cesmount-ed drive is used as your home directory on any Apollo machine. So, if you save your code on your virtual disk drive, it will automatically be updated/mirrored on the Apollo machine you are using pretty cool! For example, say you just tested your code on an Apollo machine and realized you need to make a change. Simply open up Xcode (or your IDE of choice), resave it to your drive again, and viola it is updated on the Apollo machines and ready for you to test, almost instantly. There is no need to use a command to transfer your files each time you update them.
That was likely more information than you needed to know in order to do the labs. But, as ECE majors, hopefully you found it interesting and perhaps inspiring. So, let’s get started with Lab 1. From here, it is assumed you have just logged into an iMac in the lab. If you have any trouble during the following procedure, please ask the instructor or a neighbor for assistance.
1.2 Assignment
browser sessions (e.g. email, Blackboard, SISWeb, iROAR, etc.). It is also important that you eject your user drive mounted via cesmount. To do so, right click the drive on your desktop and click Eject. If your iMac does not have its right click enabled, hold down the control key and click the icon simultaneously. On the pop-up menu, click Eject. Alternatively, if you have your terminal window open, you may run the cesunmount command. Upon success, it will display a message confirming your drive has been removed, and you will see it disappear from the Desktop. If you have any trouble ejecting your drive, please ask the instructor for assistance; otherwise, your data could become compromised if left accessible on the machine.
The following is the C driver. Do not modify this code. You are not required to add comments to the driver.
/* begin C driver */ #include <stdio.h> int main(int arg, char *argv) { char buffer[256]; do { int i = 0; printf ("Enter a string terminated with a newline\n"); do { buffer[i] = getchar(); } while (buffer[i++] != ’\n’); buffer[i-1] = 0; / asum() is the function implemented in assembly / i = asum(buffer); if (i) { printf ("ascii sum is %d\n", i); continue; } } while(1); return 0; } / end C driver */
The following is the assembly solution to the int asum(char *) function. You are required to add comments (program and function headers only) to this file. However, you are not required to understand the implementation details of this code at this point in the course. For Labs 2 6, you will be given an assembly stub file instead of the solutions. The assembly stub will require you apply the topics discussed in each lab in order to form a completed solution to the assignment.
/* begin assembly code */
.globl asum .type asum,@function asum: pushl %ebp movl %esp, %ebp subl $4, %esp movl $0, -4(%ebp) .L2: movl 8(%ebp),%eax cmpb $0,(%eax) jne .L jmp .L .L4: movl 8(%ebp),%eax movsbl (%eax),%edx addl %edx, -4(%ebp) incl 8(%ebp) jmp .L .L3: movl -4(%ebp), %eax jmp .L .L1: movl %ebp, %esp popl %ebp ret
/* end assembly / / Do not forget the required blank line here! */
int a, b, c, d, e; a = ((b + c) - (d + e)) - 10;
Code 2.1.
The expression in Code 2.1.1 performs four computations in one statement using four variables (a, b, c, d, and e) and a constant (the number 10). In assembly language we cannot perform such a complex statement. In x86 assembly language, each instruction can perform only one computation at a time and may reference up to one memory variable per computation. At least one of the required data (i.e. arguments to the instruction) must be in a register. To start, four general purpose registers provided in the 80386 are A, B, C, and D. Thus, the previous example would look like this:
.comm a, 4 .comm b, 4 .comm c, 4 .comm d, 4 .comm e, 4 .text movl b, %eax # move variable b into register A addl c, %eax # add variable c to register A movl d, %ebx # move variable d into register B addl e, %ebx # add variable e to register B subl %ebx, %eax # subtract register B from register A subl $10, %eax # subtract 10 from register A movl %eax, a # move register A to variable a
Code 2.1.
Note the comments in Code 2.1.2 above. In assembly, a hash or pound symbol (#) is interpreted as the start of an inline comment. Unlike C, double-slashes (//) and block comments (/* */) cannot be on the same line as assembly code; however, they can be used on lines without assembly code the program and function header comments, for instance. Placing these types of comments on the same line as assembly code will generate a compile-time error. (To avoid this, use the pound symbol like the example above.)
2.2 Data Storage and Variables
Let’s break this down piece by piece. First of all, in order to declare a variable, we use a statement that will define a storage location and assign a name or symbol to that location (or address). Actually, this isn’t an instruction at all but an assembler directive. These are commands to the assembler program invoked by GCC to perform some action in this case, reserve memory for a variable. There are several of these directives that can be used to reserve memory. Which one we use, depends on what size block of memory we want to allocate (similar to the data types char, short, int, and long in C). In assembly, there are directives used to allocate space for uninitialized variables, and directives used in order to reserve memory and initialize variables. The most common directive used in this course is .comm, which creates a symbol (or variable as it is sometimes called) with the name given as the first argument and reserves the number of bytes listed as the second argument. This variable name is actually a placeholder for the
address in memory where the space is allocated. At assemble time, all variable names are replaced by their respective memory addresses. Note that there is no type information associated with the memory or the symbol.
Alternatively we could have chosen to initialize the allocated space to some value. In C we could have said:
int a; /* uninitialized / int b = 10; / decimal / int c = 0x20; / hexadecimal / int d = ’a’; / ascii / int e = 040; / octal / int f = 024; / C does not have a binary type / / this is octal */
Code 2.2.
which, in assembly language would be:
.comm a, 4 # declare variable a as 4 bytes (4B) b: .int 10 # declare var b ; init to 10 c: .int 0x20 # declare var c ; init to 0x d: .int ’a’ # declare var d ; init to a e: .int 040 # declare var e ; init to octal 040 f: .int 0b000010100 # declare variable f
Code 2.2.
In Code 2.2.2, note the syntax for expressing values in different number bases, including the octal and binary syntax, the latter of which does not exist in C. The symbol created is defined by the label to the left of the colon on each line. (We will discuss labels in greater detail in Lab 3.) The value it is initialized to is located to the right of the directive .int. Other directives include .byte, .hword, .word, .quad, and .octa to initialize 1, 2, 4, 8, and 16-byte integers, respectively. Likewise, for floating point numbers, .float, .single, and .double are directives to initialize 4, 4, and 8-byte floating point numbers, respectively. (Note .float and .single both initialize 4-byte floating point numbers.)
2.3 Moving Data, Addition, Subtraction, and Constants
Before we begin, in x86 assembly, there are two popular syntaxes used Intel syntax and AT&T syntax. Although we are writing code for an Intel x86-based processor, we will use AT&T syntax. Why? Well, GNU GCC works natively with AT&T syntax. In order for us to compile our programs with GCC, we must use this syntax. There are no pros or cons to one or the other they are simply different ways of doing the same thing. Please note that both syntaxes are directly mapped to the Intel x86 machine language there are no compute differences at runtime. Now, lets get started:
Now that weve masted the move instruction, lets move on to addition and subtraction in assembly language.
In C and other high-level languages, it is fairly common to write code that performs the addition of more than one source and a different destination, all on a single line, as shown in Code 2.3.5 below:
dst = src1 + src2;
Code 2.3.
However, it is not possible to perform such a complex addition in assembly language. What we must do instead is break this addition up into many smaller addition operations. To facilitate this, addition in assembly language works by adding one argument to another, as shown in Code 2.3.6:
dst = dst + src; # or equivalently: dst += src;
Code 2.3.
In assembly language, to perform the simple addition in Code 2.3.6, we would write:
addl src, dst # add src to dst and store result in dst
Code 2.3.
But remember, as was true for the movl instruction, for addition, with respect to Code 2.3.7, either dst, or src, or both must be a register. In Code 2.3.8 below, in order to add two variables, we must first move one to a register, then perform the addition.
int a, b; a += b;
Code 2.3.
The equivalent in assembly is:
.comm a, 4 # reserve 4 bytes of space for a .comm b, 4 # reserve 4 bytes of space for b movl b, %eax # first copy variable b to a the A register addl %eax, a # add the A register (var b) to variable a
Code 2.3.
So, continuing our initial example in Code 2.3.5, if we want to add one variable to another and store the result in a different variable, we must first move one into a register, perform the addition, and then copy the result to the desired destination. For instance:
int dst, src1, src2; dst = src1 + src2;
Code 2.3.
is written in assembly language as:
.comm dst, 4 # reserve 4B of space for dst , .comm src1, 4 # src1 , .comm src2, 4 # and src movl src1, %eax # copy variable src1 to register A addl src2, %eax # add src1 to src2; store the result in A movl %eax, dst # copy the result to variable dst
Code 2.3.
See, its that easy we just need to get accustomed to thinking in smaller steps.
Now, subtraction in assembly works just like addition. So, the following operation in C:
int a, b, c; a = b - c;
Code 2.3.
is written in assembly language as:
.comm a, 4 # reserve 4 bytes of space for each variable .comm b, 4 .comm c, 4 movl b, %eax # copy variable b to register A subl c, %eax # subtract c from b (in register A) and
movl %eax, a # move the result of b - c to variable a
Code 2.3.
Notice in Code 2.3.12 and Code 2.3.13 that subtraction (just like addition) takes two arguments where the first is the source and the second is the destination. For both addition and subtraction, it is very important to note the add to and subtract from functions implemented by addl and subl, respectively. The destination argument is not simply the destination; the data present in the destination argument will first be used as part of the computation (i.e dst +/- src), then it will be overwritten with the result (i.e. dst = dst +/- src). As such, if the original data in the destination argument is important, be sure to movl it somewhere else (i.e. copy it) so that it is not lost after the computation.
Lastly, just as we can specify a constant in C to use in a computation, we can specify a constant in assembly language. Constants in assembly are preceded by the $ symbol:
significant bit is on the right bit 0. If we want to access or store 8-bit data in the register, we can use either bits 15 through 8 or bits 7 through 0. The former can be accessed by referring to the register as %ah, %bh, %ch, or %dh, depending on which register we want to use. Referring to the register as %al, %bl, %cl, or %dl can access the latter. As mentioned previously, the h stands for the high-order bits (15 to 8) of bits 15 to 0; the l stands for the loworder bits (7 to 0) of bits 15 to 0. So theoretically, if we wanted, we could store two 8-bit values in a single register by storing one using %ah and the other using %al. As the table illustrates, they would be in two different physical locations within the same register.
What about 16-bit data types? They can be referenced the following ways:
16-bit: %ax; %bx; %cx; %dx
Code 2.4.
These four registers in Code 2.4.2 represent the least-significant 16 bits of the total 32 available bits in the general-purpose registers. Note that, as shown in Table 2.4.1, these are the exact same 16 bits used for referencing 8-bit data sizes; only they are being referenced as all 16 at once, as opposed to 15 to 8 and 7 to 0 separately. The x in the syntax stands for extended, meaning it extends the number of bits referenced from 8 bits to 16 bits.
Last, but certainly not least is the 32-bit data size. It is the size most frequently used in this course and in most assembly programs. It is also the register size used in the previous move, addition, and subtraction examples in the prior sections of this lab, so its syntax should look familiar. It can be referenced the following ways:
32-bit: %eax; %ebx; %ecx; %edx
Code 2.4.
The syntax in Code 2.4.3 above represents all 32 bits of the register for the A, B, C, and D general- purpose registers, respectively. The e in the register name stands for extended and the x stands for extended as well. Originally, when the first Intel 80XXX processor was developed, there were only 8-bit registers. Therefore, as the family of processors matured and technology increased in sophistication, the new 16-bit processors eXtended the 8-bit ones, and when the time came around, new 32-bit processors Extended the older 16-bit ones. As seen in Table 2.4.1, working with 32-bit data leverages all available bits in the register. However, like explained previously, these same 32 bits can be accessed 16 or 8 at a time, depending on the syntax used to reference the register.
Aside: Although it is not a part of this course, 64-bit system architectures and operating systems x86 64 are becoming more prevalent. Their registers work in the same fashion, but to access all 64 bits of information, one must reference them as %rax, for example. 32, 16, and 8 bit accesses work the same as described above.
Now, when we refer to a register in an instruction, the size of the register must match the size of the opcode. The opcode is merely a fancy name for the bits that characterize the instruction or operation being performed. Assembly instructions, in addition to the data they operate on, are also represented in the computer in binary coding this is called the opcode. Note these instructions are specific to the size of data we want to work with. In 80386 assembly language, instructions can be used with 1, 2 or
4-byte data, specified with an opcode suffix of either b (for byte), w (for word), or l (for long-word), respectively. Recall that all of the assembly instructions in the earlier examples in this lab have used 4-byte long-words; thus, all of the opcodes have had an l suffix as in addl, movl, subl. This is the most common data size we will work with. But, be aware that there are also instructions for other data sizes, such as addb, divb, movb for 1-byte words. We need to be careful and match the opcode suffix with the correct register reference. Thus, to match the opcode with the parameters, the instruction:
addb $2, %al
Code 2.4.
is an 8-bit operation, where the b and %al correspond to 8-bit instructions. On the other hand:
addw $2, %ax
Code 2.4.
is a 16-bit operation, where the w and the %ax correspond to 16-bit instruction syntax. Table 2.4. summarizes opcode suffixes:
Table 2.4.
Remember, the instruction and the data size need to match up in order to compile without errors. For example, if we want to add 32-bit data sizes, the opcode suffix needs to be l making addl, and the parameters to the instruction addl need to be variables declared as 4 bytes or registers using the 32-bit syntax %eax or %ebx, for example.
2.5 Multiplication and Division
In assembly, the multiplication and division instructions are somewhat more complex than the other operations we have discussed. Lets start with multiplication. First, there are two versions: multiplication for integers, imull, and multiplication of unsigned numbers, mull. We will discuss mull in detail; however, keep in mind there is an alternative for integers only.
The mull instruction has a single operand (which can be a variable or a register). The value of this operand is multiplied by the A register, and the result is placed back in the A register, and potentially the D register. Yes, that’s right, the mull instruction can have potentially two destination registers. Also note that one parameter to the multiplication instruction is assumed to be in the A register. So, this means that if we want to multiply the contents of register B and register C we cannot do: