Modeling nondeterminism inherent in y86 system calls
Matt Kaufmann
ACL2 Seminar
April 23, 2013

Outline:

I.   Introduction
II.  Syscall-write in the logic and in raw Lisp
III. Oracle implementation
IV.  Logical story

------------------------------------------------------------
I.   Introduction
------------------------------------------------------------

In this talk I present a way to model the nondeterminism inherent in
system calls.  Thanks to Shilpi Goel for helping me to talk this
through.  It might help the reader to be familiar with Soumava Ghosh's
presentation made just before this one in the ACL2 seminar.

I'll focus on syscall-write (but similar considerations apply to
syscall-open, syscall-close, and syscall-read).

Here is a naive approach:

(syscall-write filedes buffer nbytes)

But this isn't a function!  Problem: How can we model syscall-write so
that it's a function?

My first approach, when Soumava wanted help modeling nondeterminism,
was to introduce

(syscall-write clk filedes buffer nbytes)

and then redefine it in raw Lisp.  I used a trusted clause processor
to explain the evaluation results; I could talk about that sometime if
there's interest, but since I abandoned that approach I won't say much
about it now.  The point was to introduce a clock, clk, with the
expectation that each call is made with a different value of clk,
which might be extracted from the y86 state.  But we could get
different values from the same clk in different sessions.  That's
especially problematic if we can prove in two different sessions that
the same call of syscall-write yields two different values, since that
could allow us to certify books with contradictory theorems, and then
include them both to prove nil!

My solution was to store a structure in the y86 state that is richer
than a clock: a so-called oracle.  The oracle is intended to hold all
results of syscall-write calls (and other syscalls) starting with that
y86 state.  That's fine for reasoning, but we don't actually want to
build an oracle when running the model on concrete data.  Thus, we
want a logical story that justifies whatever is actually returned by a
raw-Lisp version of syscall-write.

------------------------------------------------------------
II.  Syscall-write in the logic and in raw Lisp
------------------------------------------------------------

Let's start top-down.  Here are the definitions for syscall-write.
I'll talk about pop-x86-32-oracle later below, but the idea is simple:
it uses the oracle to get the next value for a syscall operation.

; Include the y86 model and introduce a trust tag (to support
; redefinition in raw Lisp, below).

(include-book "y86-basic-new/y86/y86")
(defttag :syscall-raw-impl)

; Here is the simple logical definition, which gets the result by
; popping the oracle.

(defun syscall-write-logic (filedes buffer nbytes x86-32)
  (declare (xargs :stobjs x86-32)
           (ignore filedes buffer nbytes))
  (mv-let (val x86-32)
          (pop-x86-32-oracle x86-32)
          (mv (list (ifix val)) x86-32)))

; The following wrapper is logically defined in terms of the logical
; definition, but is defined differently in raw Lisp (see below).

(defun syscall-write (filedes buffer nbytes x86-32)
  (declare (xargs :stobjs x86-32
                  :guard (and (i32p filedes)
                              (n32p nbytes)))
           (type string buffer))
  (syscall-write-logic filedes buffer nbytes x86-32))

; Next we overwrite the above definition in raw Lisp, using a
; definition based on the one originally provided by Soumava.  I added
; the mv returning the stobj (whose oracle has logically changed), as
; well as an early exit using the logical definition except when we
; have a true "live" stobj whose oracle is of a special, "hidden" sort
; (described in Section III).

(include-book "tools/include-raw" :dir :system)
(include-raw "systemcalls-raw.lsp")

; From systemcalls-raw.lsp:
(defun syscall-write (filedes buffer nbytes x86-32)
  (when (not (and (live-stobjp x86-32)
                  (x86-32-oracle-hidden-p x86-32)))
    (return-from syscall-write
                 (syscall-write-logic filedes buffer nbytes x86-32)))
  (setq ptr  (ccl::make-cstring buffer))
  (setq ret  (ccl::external-call "syscall"
                                 :signed-int 1
                                 :signed-int filedes
                                 :address ptr
                                 :unsigned-int nbytes
                                 :signed-int))
  (ccl::dispose-heap-ivector ptr)
  (mv (cons ret nil)
      x86-32))

Here is a little demo.

  ACL2 !>(include-book "systemcalls")
  [[ ... output elided ... ]]
   "/v/filer4b/v41q001/kaufmann/projects/x86/svn-y86-syscall/matt/systemcalls.lisp"
  ACL2 !>(update-x86-32-oracle-from-state x86-32 state) ; explained in Part IV
  (<x86-32> <state>)
  ACL2 !>(syscall-write 1 ; stdout
		 "hello" ; buffer, a string
		 (1+ (length "hello")) ; might not need 1+
		 x86-32
		 )
  hello ((6) <x86-32>)
  ACL2 !>(mv-let (result x86-32)
		 (syscall-write 1               ; stdout
				"hello"         ; buffer, a string
				(1+ (length "hello")) ; might not need 1+
				x86-32)
		 (mv (equal '(6)
			     (prog2$ (cw "~%~%some newlines~%~%") result))
		     x86-32))
  hello 

  some newlines

  (T <x86-32>)
  ACL2 !>; The following had better fail!
  (thm (mv-let (result x86-32)
	       (syscall-write 1               ; stdout
			      "hello"         ; buffer, a string
			      (1+ (length "hello")) ; might not need 1+
			      x86-32)
	       (declare (ignore x86-32))
	       (equal '(6) result)))
  [[ ... output elided ... ]]
  ******** FAILED ********
  ACL2 !>

The reason it's important for the THM form above to fail is that
different calls of syscall-write might execute to return different
values.  In fact, there is no way for a "live" x86-32 stobj with a
"hidden" oracle to be encountered in a term processed by the prover.
We'll return to this point below.

------------------------------------------------------------
III. Oracle implementation
------------------------------------------------------------

I'll start by describing modifications I made to:
/projects/acl2/devel/books/models/y86/y86-basic/common/x86-state.lisp

First, I added two new fields in (defstobj x86-32 ...):

  x86-32-oracle
  x86-32-oracle-hidden-p

I don't want to provide direct access to the oracle when the `hidden'
bit is set, I introduce new renamings in the same stobj definition:

   (x86-32-oracle unsafe-x86-32-oracle)
   (update-x86-32-oracle unsafe-update-x86-32-oracle)

Then we can define x86-32-oracle as follows, to prevent reading of the
oracle field when the hidden bit is set.

(defun x86-32-oracle (x86-32)
  (declare (xargs :stobjs x86-32))
  (prog2$ (and (x86-32-oracle-hidden-p x86-32)
               (er hard? 'x86-32-oracle
                   "Attempt to apply ~x0 to an x86-32 stobj with a hidden ~
                    oracle!"
                   'x86-32-oracle))
          (unsafe-x86-32-oracle x86-32)))

And similarly we define a safe updater, because we don't want to be
able to modify the oracle in a way that won't be reflected by our
system calls.

(defun update-x86-32-oracle (new-oracle x86-32)
  (declare (xargs :stobjs x86-32))
  (prog2$ (and (x86-32-oracle-hidden-p x86-32)
               (er hard? 'update-x86-32-oracle
                   "Attempt to apply ~x0 to an x86-32 stobj with a hidden ~
                    oracle!"
                   'update-x86-32-oracle))
          (unsafe-update-x86-32-oracle new-oracle x86-32)))

We then remove access to the unsafe versions of the oracle accessor
and updater.

(push-untouchable unsafe-x86-32-oracle t)
(push-untouchable unsafe-update-x86-32-oracle t)

We may now define the operation for obtaining a value from the
oracle.  For convenience, we treat the oracle as an association list,
where each entry associates an eip (program counter) with a list of
values.  Shilpi and I believe that this structure will make it easier
to reason about oracles during proofs than if the oracle were just a
flat list of values.  We use functions safe-assoc and safe-put-assoc
that are variants of assoc whose guards do not require the oracle to
be an association list.

(defun pop-x86-32-oracle (x86-32)
  (declare (xargs :stobjs x86-32))
  (let* ((oracle (x86-32-oracle x86-32))
         (oracle (if (consp oracle) oracle nil))
         (eip (eip x86-32))
         (vals (safe-assoc eip oracle))
         (val (car vals))
         (x86-32 (update-x86-32-oracle
                  (safe-put-assoc eip (cdr vals) oracle)
                  x86-32)))
    (mv val x86-32)))

Recall that the oracle is not actually used for execution when we have
a "live" y86 state with a "hidden" oracle.  How do we make the oracle
"hidden"?  We get it by reading, just once, the state's oracle!

(defund update-x86-32-oracle-from-state (x86-32 state)
  (declare (xargs :stobjs (x86-32 state)))
  (mv-let (erp val state)
          (read-acl2-oracle state)
          (let* ((x86-32 (update-x86-32-oracle (and (null erp) val) x86-32))
                 (x86-32 (update-x86-32-oracle-hidden-p t x86-32)))
            (mv x86-32 state))))

Then we make sure that the user can't unhide an oracle (which isn't
really there in the implementation), as follows.

(push-untouchable update-x86-32-oracle-hidden-p t)

At this point, the only way to make a "hidden" oracle is by calling
update-x86-32-oracle-from-state on a "live" state.  We might use
with-local-stobj to create a live x86-32 during a proof, but there is
no way (without misusing trust tags) to create a live state that
occurs in a term processed by the prover.

------------------------------------------------------------
IV.  Logical story
------------------------------------------------------------

To review: we have introduced an oracle field of the y86 state that
logically contains all the values returned by system calls.  Note that
there is no trust tag needed for the y86 model.  Thus, the logical
story is simple and clean.  Well, except: how do we explain execution
results that use the raw-Lisp code for syscall-write?

Normally, when we get an evaluation result in the top-level ACL2 loop,
we expect that the input is provably equal to the output.  Consider
the following, for example.

  ACL2 !>(+ 3 4)
  7
  ACL2 !>

We then expect that the input and output are provably equal, and in
our example above, that is verified by ACL2's prover:

  ACL2 !>(thm (equal (+ 3 4) 7))

  Q.E.D.

  Summary
  Form:  ( THM ...)
  Rules: NIL
  Time:  0.00 seconds (prove: 0.00, print: 0.00, other: 0.00)
  Prover steps counted:  1

  Proof succeeded.
  ACL2 !>

But when we run the y86 in the loop and get a result that depends on
system calls, then in what sense do we have a theorem?

We have a theorem with respect an input oracle that happens to have
all the correct system call results ready to be plucked off.  That
oracle is, logically, the car of the oracle of the ACL2 state.  The
beauty part is that we can just assert that the ACL2 state's oracle
happened to have that value at the time we consulted its oracle!

Soumava Ghosh has pointed out that this approach gives us an
interpreter that can allow behaviors that don't occur in practice,
such as reading a file that hasn't been opened.  I've discussed this
issue a bit with him and Shilpi.  I think we agreed that it may not be
necessary to restrict the model to rule out such behaviors.  But if
that becomes necessary, it will probably be possible to modify the
oracle, the interpreter, or both, to implement such restrictions.  For
example, we might add a field to the y86 state, or else add a "kernel"
argument to the interpreter, to associate file descriptors with their
statuses (e.g., associating 0 [stdin] with '(read), and associating
[stdout] and stderr with, say '(append)).  Or perhaps instead, the
oracle might associate clocks with system call results obtained at
those clocks, and a hypotheses could check that calls don't violate
the access-after-open protocol.