Search-engine friendly clone of the ACL2 documentation.

X86isa

Program-execution

Setting up the x86 ISA model for a program run.

IMPORTANT: Note that if these books were built using any other value of X86ISA_EXEC except t, then instructions like SYSCALL and RDRAND will not be available for execution, though reasoning about them will still be possible. See x86isa-build-instructions for details.

First, obtain the x86 machine-code version of the program you want to execute on the model. Note that we support only ELF (created on Linux machines) and Mach-O binaries (created on Darwin/Apple machines).

Here is a sample popcount program in C that will be used for illustration throughout this section.

// FILENAME: popcount.c 
// From Sean Anderson's Bit Twiddling Hacks 
// See https://graphics.stanford.edu/~seander/bithacks.html 
 
#include <stdio.h> 
#include <stdlib.h> 
 
int popcount_32 (unsigned int v) { 
  v = v - ((v >> 1) & 0x55555555); 
  v = (v & 0x33333333) + ((v >> 2) & 0x33333333); 
  v = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; 
  return(v); 
} 
 
int main (int argc, char *argv[], char *env[]) { 
  int v; 
  printf ("\nEnter a 32-bit number: "); 
  scanf  ("%d", &v); 
  v = v & 0xffffffff; 
  printf ("\nPopcount of %d is: %d\n", v, popcount_32(v)); 
  return 0; 
}

We can use the following command to obtain the x86 binary corresponding to this program; we will call this binary popcount.o.

gcc popcount.c -o popcount.o

You can see the assembly and machine code corresponding to this program using utilities like objdump on Linux machines and otool on Mac machines. Example invocations are as follows:

On Linux machines:

objdump -d popcount.o

On Mac machines:

otool -tvj popcount.o

Note that the assembly and machine code can differ from machine to machine, program run to program run. All the concrete values of addresses, etc. used in this example can be different for you.

The following events describe the process of setting up the x86 ISA model for the execution of popcount.o.

Include the top-level book (i.e., X86ISA/top.lisp) in a fresh ACL2 session.
```
(include-book "top" :ttags :all)
```
Alternatively, it can be faster to just include tools/execution/top.lisp.
```
(include-book "tools/execution/top" :ttags :all)
```
You should always be in the X86ISA package when working with the x86 books. If you start an ACL2 session when standing under any of the x86 directories, your ACL2 session will start in the X86ISA package. Otherwise, you also need the following form after the above include-book.
```
(in-package "X86ISA") 
```
If you desire to run the model in the app-view, skip this step. Use the function init-sys-view to switch the model to the system-level view and load our default configuration of 1G page tables (set up to provide an identity mapping from linear to physical addresses) into the model's memory at the provided address, i.e., 0x0 in our example. Of course, this is a contrived example. For more flexibility in constructing and loading page tables, see Setting-up-page-tables.
```
(init-sys-view 0 x86) 
```
Read and load popcount.o into the x86 model's memory using the macro binary-file-load.
```
 (binary-file-load "popcount.o" :elf t) ;; or :mach-o t
```
binary-file-load will fail if popcount.o is not a supported ELF or Mach-O file.
Use init-x86-state-64 to modify other components of the x86 state, like the instruction pointer, registers, arguments in memory, if necessary. This function only writes the specified values into the x86 state while preserving any values previously written to the x86 state. For example, the following form will not make changes to any general-purpose registers except rdi and rsp. Note that the function init-x86-state-64 sets up the state to be in 64-bit mode of operation of an x86 machine --- see init-x86-state for an initialization function that does not do so.
```
(init-x86-state-64 
 ;; Status (MS and fault field) 
 nil 
 ;; Start Address --- set the RIP to this address 
 #x100000f12 
 ;; Initial values of General-Purpose Registers 
 '((#.*rdi* . #xff00ff00) 
   (#.*rsp* . #.*2^45*)) 
 ;; Control Registers: a value of nil will not nullify existing 
 ;; values. 
 nil 
 ;; Model-Specific Registers: a value of nil will not nullify existing 
 ;; values. 
 nil 
 ;; Segment Registers: a value of nil will not nullify existing values. 
 nil ; visible portion 
 nil nil nil ; hidden portion 
 ;; Rflags Register 
 2 
 ;; Memory image: a value of nil will not nullify existing values. 
 nil 
 ;; x86 state 
 x86) 
```
Aside: Some other ways to initialize the x86 state are listed below. The list is not exhaustive.
- The memory image argument of init-x86-state accepts alists satisfying n64p-byte-alistp. This can be used to provide a program binary in the form of a list of address-byte pairs rather using binary-file-load to parse and load the binary automatically.
- The function load-program-into-memory also accepts programs that satisfy n64p-byte-alistp.
- Of course, wml08 (and its friends like write-bytes-to-memory) can also be used to write a program to the memory. Initialization of other components of the x86 state can be done by using the appropriate updater functions directly. For example, !stri can be used to update system registers like GDTR and IDTR when operating in the system-level view.
All the mechanisms to initialize the x86 state aside, how do we know what values to put in the x86 state? This is an important and interesting question. Its answer depends on the program (or snippet of a program) we intend to execute and requires the user to be familiar with x86 assembly and calling conventions. For this popcount example, suppose all we wanted to execute was the popcount_32 routine with an actual concrete argument, say 0xff00ff00. One way to figure out what the start and halt addresses should be in this case is to look at the output from objdump or otool for instructions in the main routine that look like the following:
```
... 
100000f10: 89 c7                mov    %eax,%edi 
100000f12: e8 49 ff ff ff       callq  100000e60 <_popcount_32> 
100000f17: 48 8d 3d 66 00 00 00 lea    0x66(%rip),%rdi 
... 
```
We set the start address to be the address of the instruction that calls the popcount_32 function (i.e., 0x100000f12 here) and halt address to be the address of the instruction following it (i.e., 0x100000f17 here).

Before entering popcount_32, the component of the x86 state that contains the concrete argument (i.e., 0xff00ff00 in our example) should be initialized appropriately too. How do we know which component is used for this purpose? Knowledge of calling conventions can help here --- for example, on 64-bit Linux, if the argument is of the integer type, then the next available register of the following sequence is used for argument passing: rdi, rsi, rdx, rcx, r8, and r9. For more details, see this and this. For our example program, the register rdi (more accurately, register edi --- the lower 32 bits of the register rdi) is used to pass the concrete argument to popcount_32. We can confirm this by inspecting the assembly. For example, in the main routine, before the call to popcount_32, we might observe an instruction that moves the argument to edi --- see the instruction at address 100000f10 in the assembly snippet above. Another clue can be the assembly corresponding to the popcount_32 routine, where we might see an instruction moving the value in edi to the stack --- see the instruction at address 100000e64 below.
```
... 
0000000100000e60 <_popcount_32>: 
   100000e60:   55                     push   %rbp 
   100000e61:   48 89 e5               mov    %rsp,%rbp 
   100000e64:   89 7d fc               mov    %edi,-0x4(%rbp) 
... 
```
It should be emphasized that it is the user's responsibility to ensure that the state is initialized "correctly", i.e., the program does not overlap with the page tables, the stack pointer is initialized so that the stack does not run out of memory nor does it overwrite the program during execution (in our example, 2^45 is the initial value of the stack pointer for this very reason), etc. Essentially, the user takes on the job of the operating system plus the compiler/linker, etc. Unless the program of choice is being executed "on top of" these system programs which are also being executed on the model, there is probably no way to remove this burden from the user.
Run the program using x86-run or x86-run-steps or x86-run-halt. To run one instruction only, use x86-fetch-decode-execute. You can also see Dynamic-instrumentation for details about dynamically debugging the program by inserting breakpoints and logging the x86 state into a file, etc.
```
(x86-run-halt  #x100000f17 ;; halt-address 
               10000       ;; limit on number of steps to be run 
               x86) 
 
;; or 
;;    (x86-run-steps 10000 x86) 
 
;; or 
;;    (x86-run 10000 x86) 
```
How do know that the program ran to completion? After executing the above form, we can inspect the contents of the following fields:
```
(fault x86) 
(ms x86) 
(rip x86) 
```
If the fault and ms fields are empty, then the program didn't run to completion and the x86 state is poised to continue execution. If rip is at the halt-address, ms contains a legal halt message, and fault is empty, then the program ran to completion successfully. If you see some other error message in either ms or fault, you need to figure out what went wrong during the program execution --- the Dynamic-instrumentation utilities can help in debugging.

Where did the number 10000 in the argument to x86-run or x86-run-steps or x86-run-halt come from? This number is the clock, i.e., the upper limit on the number of instructions the x86 interpreter will execute. Fewer instructions that this number can be executed if the program reached the halt address sooner or if an error is encountered (in which case the ms field will contain the error message). It might also be the case that this argument to these functions is less than the number of instructions required to run the program to completion. So, how do we pick the value of the clock?

This, again, is up to the user. Guessing the clock value is an answer. In our example, 10000 is large enough --- this example program is small enough that it takes only around a couple dozen instructions to run to completion. You need not worry about the interpreter stepping uselessly after the program halts (or encounters an error) because then, the ms or fault fields will contain a message and these functions will stop executing as soon as these fields are non-nil. On the other hand, if the clock you provided is not sufficient, then you can always execute these functions again and the program execution will continue.
Inspect the output of the program. For this program, register eax contains the return value --- x86-64 Linux calling conventions dictate that rax be the first return register. Of course, as before, we can inspect the assembly to confirm if this is the case.
```
(rgfi *rax* x86) ;; Note that eax is the low 32 bits of rax. 
```
For the value 0xff00ff00, the register rax should contain 16.

If you wish to run this program again in the same ACL2 session, remember to initialize the x86 state appropriately.

Subtopics

Dynamic-instrumentation: Utilities to instrument a program run on the model
Initialize-x86-state: Initializing the x86 state for concrete program execution.
Binary-file-load-fn: Function to read in an ELF or Mach-O binary and load text and data sections into the x86 ISA model's memory.
Read-channel-into-byte-list1
Setting-up-page-tables: Setting up the page tables for a user-level program's run.
Init-zero-page
Load-file-into-memory
Linux-load
Init-sys-view: Switches the model to the system-level view and load our default configuration of 1G page tables.
Load-elf-sections
Chars-to-c-str
String-to-c-str
Pack-u64
Pack-u32
Concrete-simulation-examples
Gdt-entry