Getting started with x86

 

Updates

Keep track of this section for any updates on this webpage.

Overview

The x86 instruction set has evolved organically over the decades, so it is quite complex. One goal of this document is to explain a small set of instructions that we need in this course. Another goal is to help you make the transition from generating SaM code to generating x86 code.

Resources

  1. There is an excellent high-level introduction to the x86 ISA. You can find it here.
  2. SASM is a cross-platform IDE for developing x86 assembly programs. Like SaM, it has an intuitive GUI and to the extent that writing x86 assembler code can be fun, SASM makes it fun. See the SASM website for more information.
  3. You can, of course, read Intel’s ISA manuals.

Differences between the x86 ISA and SaM ISA

x86 assemblers

One advantage of SaM is that there is no other SaM, and Pingali is its prophet. For x86 assembly code, on the other hand, there are a bewildering number of different assemblers, and you will see acronyms like NASM (network assembler), MASM (Microsoft assembler), GAS (GNU assembler), AT&T syntax, Intel syntax, etc. Each one is different and assembly programs produced for one assembler will usually not work with other assemblers. This means that x86 code you get from the Internet may not work with your assembler. Another issue is system calls: assembly programs that make Windows systems calls for, say, printing values will not work on Linux because system calls are different on the two operating systems. A final issue is linking with routines in libraries like libc: if you want to link to these routines, you must use the standard protocol for calling these routines.

In this course, we will use the NASM syntax, which is supported by the SASM IDE. It is simpler than the others. The documentation for SASM says that it supports MASM and other formats but you should not use these since it complicates grading. One advantage of SASM is that it has its own routines (macros) for I/O, and these are translated by the SASM assembler into the appropriate system calls for whatever platform you are generating code for. This means you do not need to worry about system calls at least for I/O, and you get a level of portability that is convenient.

Here is a simple SASM assembly file. The program is in the .text section. The entry point into your code must be labeled CMAIN, and it must be declared to be global. The code calls a SASM print routine to print the 32-bit (4 bytes) integer 666 and then returns.

%include "io.inc"
section .text
    global CMAIN
CMAIN:
    push ebp; set up frame base register
    mov ebp, esp
    PRINT_DEC 4, 666
    pop ebp; restore frame base register and return
    ret

More complicated programs will have a .global section where global variables like strings are allocated. See the file for factorial in SASM.

Generating x86 code from Bali

There are two ways you can modify your Bali -> SaM compiler to generate x86 code.

1. Expand each SaM instruction

In this method, your compiler would generate SaM code and then translate each SaM instruction into small sequences of x86 instructions. For example, the SaM instruction ADD can be implemented by the following x86 sequence:

pop ebx
pop eax
add eax, ebx
push eax

Similarly, the LINK instruction in SaM can be implemented by the sequence

push ebp
mov ebp, esp

This is essentially what binary translators and just-in-time (JIT) compilers do. I haven’t worked out the details so I don’t know if there are any hidden gotchas with this approach. Since the stacks in SaM and x86 grow in opposite directions, there may be some subtle issues with stack manipulation.

2. Generate x86 directly

Another way is to completely retarget your compiler to produce x86 code directly. Here are the key points to keep in mind.