Getting started with x86

Updates

Keep track of this section for any updates on this webpage.

Overview

The x86 instruction set has evolved organically over the decades, so it is quite complex. One goal of this document is to explain a small set of instructions that we need in this course. Another goal is to help you make the transition from generating SaM code to generating x86 code.

Resources

There is an excellent high-level introduction to the x86 ISA. You can find it here.
SASM is a cross-platform IDE for developing x86 assembly programs. Like SaM, it has an intuitive GUI and to the extent that writing x86 assembler code can be fun, SASM makes it fun. See the SASM website for more information.
You can, of course, read Intel’s ISA manuals.

Differences between the x86 ISA and SaM ISA

x86 has data registers
In this document, we use registers eax and ebx for storing data values. Register esp is the stack pointer and register ebp is the frame base register. All these are 32-bit registers. The picture below (from (1) above) is useful; ignore the sub-fields of some of the registers such as eax.
The x86 stack grows towards smaller addresses.
In particular, the stack pointer esp will decrease when a value is pushed on the stack.
x86 is a 2-address ISA
Arithmetic instructions like add or sub require two operands, and the result overwrites the first operand. For example, the instruction add ebx, eax adds the contents of registers ebx and eax, and stores the result in register ebx.
- For some instructions, one of the operands can be in memory. We will use the register-indirect and register-indirect-with-offset addressing modes. For example add eax, [ebp] reads the contents of memory at the address pointed to by ebp and adds the value to register eax. This is different from add eax, ebp, which adds the contents of ebp to eax. The instruction add eax, [ebp+8] performs a similar operation but with the contents of memory at address ebp+8.
- The second operand can be a constant; for example, add esp, 4 adds 4 to the stack pointer esp.
The x86 ISA supports operations on data values of different lengths
Including bytes (8 bits), words (16 bits), and double words (32 bits). To avoid complexity, we restrict ourselves to 32-bit integers and addresses.
x86 has 32-bit addresses but each address refers to a byte in the address space.
Even if we restrict ourselves to 32-bit integers and addresses, we must be aware that addresses refer to bytes and not to words or double words.
- For example, to get rid of a 32-bit integer from the top of the stack, you must write add esp, 4 and not add esp, 1.
- Some instructions need to be explicitly told the length of the operands. For example, mov [esp], 2 is ambiguous: do you want to update a byte, word or double word on top of the stack? If you use SASM, you have to write mov dword [esp], 2 to specify that the intended length of the operation is a double word. On the other hand, mov eax, 2 works fine because the assembler knows that eax is a 32-bit register so it deduces that the intended length is a double word. SASM does not care if you specify dword where it is not needed, so you can write mov dword eax, 2 if you want.
x86 has condition code registers.
When an arithmetic operation is performed, bits in a special register called the condition code register (CCR) are set depending on the result of the operation. Conditional jumps refer to these bits to determine which way to jump. For example, one bit in the CCR is called the zero (Z) bit, and it is set if the result of an instruction such as addition or subtraction is zero; similarly, the negative (N) bit is set if the result is negative. Conditional jump instructions refer to these bits: for example, jz foo is jump-if-zero and takes the branch if the Z is set. Some instructions do not change the CCR. See (1) in the resources section for a more detailed discussion.

x86 assemblers

One advantage of SaM is that there is no other SaM, and Pingali is its prophet. For x86 assembly code, on the other hand, there are a bewildering number of different assemblers, and you will see acronyms like NASM (network assembler), MASM (Microsoft assembler), GAS (GNU assembler), AT&T syntax, Intel syntax, etc. Each one is different and assembly programs produced for one assembler will usually not work with other assemblers. This means that x86 code you get from the Internet may not work with your assembler. Another issue is system calls: assembly programs that make Windows systems calls for, say, printing values will not work on Linux because system calls are different on the two operating systems. A final issue is linking with routines in libraries like libc: if you want to link to these routines, you must use the standard protocol for calling these routines.

In this course, we will use the NASM syntax, which is supported by the SASM IDE. It is simpler than the others. The documentation for SASM says that it supports MASM and other formats but you should not use these since it complicates grading. One advantage of SASM is that it has its own routines (macros) for I/O, and these are translated by the SASM assembler into the appropriate system calls for whatever platform you are generating code for. This means you do not need to worry about system calls at least for I/O, and you get a level of portability that is convenient.

Here is a simple SASM assembly file. The program is in the .text section. The entry point into your code must be labeled CMAIN, and it must be declared to be global. The code calls a SASM print routine to print the 32-bit (4 bytes) integer 666 and then returns.

%include "io.inc"
section .text
    global CMAIN
CMAIN:
    push ebp; set up frame base register
    mov ebp, esp
    PRINT_DEC 4, 666
    pop ebp; restore frame base register and return
    ret

More complicated programs will have a .global section where global variables like strings are allocated. See the file for factorial in SASM.

Generating x86 code from Bali

There are two ways you can modify your Bali -> SaM compiler to generate x86 code.

1. Expand each SaM instruction

In this method, your compiler would generate SaM code and then translate each SaM instruction into small sequences of x86 instructions. For example, the SaM instruction ADD can be implemented by the following x86 sequence:

pop ebx
pop eax
add eax, ebx
push eax

Similarly, the LINK instruction in SaM can be implemented by the sequence

push ebp
mov ebp, esp

This is essentially what binary translators and just-in-time (JIT) compilers do. I haven’t worked out the details so I don’t know if there are any hidden gotchas with this approach. Since the stacks in SaM and x86 grow in opposite directions, there may be some subtle issues with stack manipulation.

2. Generate x86 directly

Another way is to completely retarget your compiler to produce x86 code directly. Here are the key points to keep in mind.

Stack frames:
The picture below shows the stack frame organization on x86 for C programs.
- Note the difference between the C stack frame and SaM stack frame. The parameters in the C stack frame are pushed in right-to-left order as opposed to left-to-right order in the SaM stack frame.
- There is no return value slot in the frame because register eax is used to return a value from a method invocation.
- Notice that the order of the saved FBR (ebp) and the return address is the reverse of what it is in SaM. In particular, the callee, instead of the caller, is responsible in saving the FBR and updating the FBR to point to the new frame (achieved by the LINK instruction in SaM) using the instruction sequence discussed above.
- x86 has a call instruction, which is like JSR in SaM. It pushes the return address and jumps to the specified address. There is a matching ret instruction that pops an address from the stack and jumps to it. You should use these instructions.
Since method invocations return values in eax, it may be convenient to require all expressions to return their value in eax rather than push the value on the top of the stack as we did in SaM. Thus, Bali expression “2” would be compiled to the instruction mov eax, 2. The expression “x” would be compiled into something like mov eax, [ebp+8], assuming x is stored at an offset of 8 from the frame base register. The expression e1-e2 would be compiled to the following code:
```
code for e1; value left in eax
push eax; save value on the stack since e2 will overwrite this register
code for e2
pop ebx;  ebx now has the value of e1
sub ebx, eax
mov eax, ebx
```
For commutative operations like +, you can avoid the last mov by generating the code add eax, ebx, for example. This approach has the virtue of simplicity, but you may be able to get more compact code by exploiting some of the memory addressing modes of x86. Whether that code runs faster is another story. Feel free to experiment.
As mentioned above, conditional jumps use the flags in the condition code register. The cmp instruction is useful for setting these flags; for example, cmp eax, ebx subtracts the value of ebx from the value in eax and sets the flags based on the result. It does not change the value in eax. Arithmetic operations like add and sub also set these flags.