Homework 2: SAT & SMT solvers

Due date: February 6, 10pm
Grading: 5% of your course grade: 3% for Part 1 (SAT solving) and 2% for Part 2 (SMT solving)

SAT and SMT solvers are widely used for software verification and synthesis, as well as many other problems. They provide a type of "assembly language" for specifying and solving logical problems—many problems can be encoded in the form of SAT or SMT constraints and then efficiently solved with an off-the-shelf solver. Many of the papers we'll read this semester either directly use a SAT or SMT solver, or indirectly use one through a higher-level framework like Rosette or Dafny.

The goal of this homework is to introduce you to SAT and SMT solvers and how to use them. We'll be solving two problems, one with a SAT solver and one with an SMT solver. In both cases, the focus is on how to reduce a higher-level problem to lower-level SAT or SMT constraints and then interpret the results returned by the solver. Later homeworks (and the tools we'll read about) often delegate this reduction to a framework, but it's helpful to understand how those frameworks work and the interface they use to the underlying solver.

Prerequisites

Both the SAT and SMT parts of this homework are written in Python, so make sure you have Python 3 available (at least version 3.7; versions 2.x won't work). You'll also need Pip and Pipenv to set up the dependencies. Get Pip from either your OS package manager (usually called something like python-pip) or by running:

python -m ensurepip --upgrade

Then get Pipenv by running:

pip install pipenv

Set up the code

We'll be using GitHub Classroom to check out and submit this homework. Follow the GitHub Classroom URL on Canvas to create your private copy of the homework repository, and then clone that repository to your machine. For example, the repository it created for me is called hw2-jamesbornholt, so I would do:

git clone git@github.com:cs395t-svs/hw2-jamesbornholt.git
cd hw2-jamesbornholt

Now let's install all the packages we need, including for SAT and SMT solving:

pipenv install

This might take a while if it needs to compile the Z3 SMT solver from source (e.g., on an Apple Silicon Mac).

Pipenv creates a virtual environment to avoid clogging up your system's Python install with our dependencies. The documentation explains what this means, but the short version is that to run the code for this homework, either preface commands with pipenv run (e.g., pipenv run python se.py or pipenv run pytest), or run pipenv shell to spawn a new shell inside your virtual environment, where you can just run bare commands like python se.py or pytest.

To make sure everything's working, run:

pipenv run pytest

You should see 46 failing tests. You'll know you've finished the homework when all these tests pass!

Part 1: SAT solving

One of the sadder parts of being a computer scientist is that we tend to suck all the fun out of games by making computers play them instead. In this part of the homework, we'll do just that for Sudoku by building a tool that automatically solves Sudoku puzzles using a SAT solver.

Open the code in the sudoku directory. There are two important files here:

puzzle.py provides some helpers for working with Sudoku puzzles—parsing, accessing, and validating. Your Sudoku solver will need to interact with these methods, but you should not need to modify this file.
solve.py contains the code for the Sudoku solver. This is where you'll fill in your solution.

PySAT primer

We'll use PySAT to interface with a SAT solver. PySAT has higher-level interfaces to generate SAT encodings, but for this homework, you must not use them—do not import or use anything from the pysat module other than the Cadical class. Here's a quick primer on this interface (the same code is in the pysat_demo function in solve.py if you'd like to play with it).

The lower-level PySAT interface is similar to the DIMACS CNF format for encoding SAT problems in textual form. Suppose we have three boolean variables x, y, and z. In PySAT, boolean variables are identified by a positive integer:

x = 1
y = 2
z = 3

A clause is a list of literals and represents their disjunction—so a clause is satisfied if at least one of its literals is true. For example, this clause says that at least one of x, y, or z must be true:

at_least_one_true = [x, y, z]

To negate a variable, flip its sign. For example, this clause says that at least one of x, y, or z must be false:

at_least_one_false = [-x, -y, -z]

To solve a SAT problem, we have to construct an instance of the solver, and then add clauses to it. PySAT supports multiple SAT solvers, but we'll be using Cadical:

solver = Cadical()
solver.add_clause(at_least_one_true)
solver.add_clause(at_least_one_false)

Now we can run the solver. The solver checks if the conjunction of the clauses added to it is satisfiable—if there is an assignment to the boolean variables that makes every clause true.

assert solver.solve(), "the problem is satisfiable"

We can also get back a model, which gives us a satisfying assignment to each variable in the problem:

model = solver.get_model()

The model is a list of literals, such as [-1, 2, 3]. If a variable in the list has positive sign, it's set to true; if negative, it's set to false. There are multiple solutions to our problem, but we know there must be at least one positive and one negative variable:

assert len([var for var in model if var > 0]) >= 1, "must be at least one true variable"
assert len([var for var in model if var < 0]) >= 1, "must be at least one false variable"

If a problem is not satisfiable, we won't get back a model:

solver = Cadical()
solver.add_clause(at_least_one_true)
# Add three clauses, each saying a variable must be false (the same as calling 
# `add_clause` three times)
solver.append_formula([[-x], [-y], [-z]])
assert not solver.solve(), "the problem is not satisfiable"
assert solver.get_model() is None, "the problem has no model"

Building a Sudoku solver

Your task will be to implement the solve function in solve.py. It takes as input a Puzzle instance (the class defined in puzzle.py) that is incomplete—some of the cells contain None rather than a digit from 1 to 9. If that puzzle has a solution (a way to fill in all the None cells), solve returns a completed Puzzle. If the input puzzle has no solution, solve returns None.

We're concerned only with classic Sudoku: a 9×9 grid, with nine 3×3 subgrids (or "boxes"). Each row, column, and sub-grid must contain all the digits from 1 to 9. There's no need for your code to handle puzzles of other sizes.

We'll build our solver by reducing a Sudoku puzzle to set of boolean variables and constraints over them. To give you an idea of where to go, your solver should follow this rough template:

def solve(puzzle: Puzzle) -> Optional[Puzzle]:
    # Construct a SAT solver
    solver = Cadical()

    # Add some clauses that encode the Puzzle, for example:
    solver.add_clause([-1, -2])
    solver.add_clause([-1,  2])
    # ...

    # Check if the clauses are satisfiable
    if not solver.solve():
        return None

    # If satisfiable, use the model to construct a new Puzzle
    model = solver.get_model()
    # ...

    return solved_puzzle

You'll know you're done when you can run this command:

pipenv run pytest -k sudoku

and see 35 passing tests and 0 failures.

Some tips and guidance for building your Sudoku solver:

The first decision you need to make is how many boolean variables you need to represent the Sudoku puzzle and how you'll index them (remember that boolean variables in PySAT or DIMACS CNF are identified by positive integers). I suggest writing a function to compute the index for a target variable, rather than repeating that logic everywhere, as you'll need it in many places.
Then you'll need to construct constraints to encode the rules of Sudoku. Remember that a clause is a disjunction, so is satisfied if any of the literals in it is true. If you want to require that multiple things are all true, you need to add multiple clauses.
- Many Sudoku rules boil down to requirements about some collection of digits (a row, column, or box): requiring that a digit appears at most once in the collection, or that all the digits in the collection are distinct. Consider writing a common at_most_one or distinct function to generate those constraints for an input collection.
There are three types of tests in the test suite:
- The test_sudoku_sat tests are for puzzles known to be solvable, and will pass if solve returns either a completed puzzle or just True.
- The test_sudoku_sat_model tests use the same puzzles as test_sudoku_sat but will only pass if solve returns a completed puzzle. This should help you test your solving code separately from the code that translates a model into a completed puzzle—if you make your solver return True if the puzzle is solvable, the test_sudoku_sat tests should pass, and the test_sudoku_sat_model tests should not.
- The test_sudoku_unsat tests are for puzzles known to be unsolvable, and will pass only if solve returns None. It's always good practice to test negative examples in verification, as it's very easy to accidentally make your solver always succeed!
There are some clever encoding optimizations you can make by interpreting the input puzzle (for example, never creating a boolean variable if you already know what its value will be, or generating simpler constraints if you know partial values for them). These optimizations are good practice in general, as they can dramatically improve the SAT solver's scalability. But classic Sudoku is small enough that they're not necessary, so I suggest not bothering with them—just build the simplest encoding (in terms of implementation complexity) that you can.
SAT solvers are known to have unpredictable performance, but again, classic Sudoku is simple enough that this shouldn't be an issue. Solving performance is not a criteria for this homework, but it should still be reasonable. My solution can solve a puzzle in ~5ms. If you're seeing performance orders of magnitude worse than this, there's probably something wrong with your encoding.

Part 2: SMT solving and symbolic execution

Now that we're up to speed on how to encode problems in SAT, we can be more productive by instead using an SMT solver. Satisfiability modulo theories (SMT) solvers accept problems in a variety of theories and combinations thereof, including many theories useful for verifying and synthesizing programs. This should make reasoning about program behavior much easier than if we needed to encode everything as boolean variables for a SAT solver.

In this part of the homework, we'll build a simple symbolic execution engine for programs written in a fragment of the C language. The engine will encode the behavior of a program into SMT, and use an SMT solver to determine whether it's possible for the program to call abort(), which we'll take to mean the program can fail.

Open the code in the se directory. The important file here is se.py, which is where we'll implement our symbolic execution engine by filling in the TODOs in the SymbolicExecutor class.

Our simple fragment of C

C is a very complex language! We'll only be implementing a symbolic execution engine for a very small fragment of the language. As an example of the fragment of C we care about, here's the contents of the sat2.c test case:

void sat2(unsigned int x, unsigned int y) {
    if (x + y == 10) {
        x = 50;
    }

    if (x == 50) {
        if (x + y == 10) {
            nothing();
        } else {
            abort();
        }
    }
}

This function can reach the call to abort() if given inputs such that x + y == 10.

The se.py file includes comments illustrating exactly which features of C need to be handled, but here's a summary of the assumptions we'll make:

We will only handle symbolic execution of a single function (like sat2).
The function is already known to compile correctly, so there's no need to worry about type errors, undeclared variables, etc.
All arguments to the function, and all variables declared in the function, will be of type unsigned int. There are no global variables.
The only function call with any meaning is abort(); the symbolic execution engine will treat calls to abort() as failures and return concrete inputs that will trigger the call. All other function calls can be ignored (the only other function call in the test cases is nothing()).
Assignments will only be made to primitive lvalues (i.e., only to variables).
Variable declarations will always contain an initial value (i.e., there are no declarations of the form unsigned int x;).
There will be no control flow statements other than if/else—so no loops (for or while), no switch/case statements, and no gotos.
The only side-effecting operations are assignments, and they will only appear as standalone statements. In other words, there's no need to worry about tricky code like if ((x = 0) && ...) that performs assignments in the middle of an expression.
We will handle a variety of binary operations (the full list is in a comment in se.py). Two important points about them:
- All integer values are of type unsigned int, and so arithmetic operations are required to wraparound on overflow. We'll assume unsigned int is 32 bits.
- We support division, but will assume there is no division by zero.

If in doubt about what features you need to support, the tests directory contains 11 example programs—6 that can reach a call to abort() and 5 that cannot. For this homework, your symbolic execution engine only needs to produce the correct results for these 11 programs; we won't test it on any others (but please don't do something silly like hardcoding the results for these 11 programs!). The intention is for this fragment of C to be just big enough to be interesting, but small enough that it shouldn't be a herculean task to define its semantics. If there's still any doubt, please ask.

Implementing symbolic execution

The SymbolicExecutor class takes as input an AST node representing a C function definition. When you call execute() on the SymbolicExecutor, it performs symbolic execution to determine whether there's any way that C function can call abort(). If so, execute() returns concrete inputs to the C function that would cause it to call abort() when invoked; if not, it returns None.

Your task is to fill in all the TODOs in se.py to produce a working symbolic execution engine. These TODOs and the comments around them provide guidance around exactly what to implement, so be sure to read the comments and docstrings. You'll know you're done when you can run this command:

pipenv run pytest -k se

and see 11 passing tests and 0 failures.

Our symbolic execution engine works by collecting all possible paths through the program. Each path is a pair of a path condition (a boolean formula that is true along that path) and a state (the values of all variables available on that path). The recursive execute_node function that you need to fill in both takes as input and returns a list of paths—the idea is that it will execute a single AST node on each of the given paths, and return the resulting set of paths. That set of paths might be larger than the input set (if the node creates multiple paths) or it might be the same size (e.g., if the node is just straight-line code). When given the entire body of a C function as the input AST node, and the single path (True, state), execute_node should return every possible path through the function.

In general, we could use the paths generated by symbolic execution to do many types of analysis. Here, we're focused only on checking abort()s. To do so, the symbolic execution engine also tracks an aborted formula that is updated every time a path reaches an abort() call. At the end of the symbolic execution, the execute function discards the results of execute_node, and instead just checks whether the aborted formula is satisfiable; if so, it declares that the program could abort. You therefore need to update aborted inside execute_node such that it is satisfiable if and only if the program could abort.

Some tips and guidance for building your symbolic execution engine:

Start small. You should be able to get the sat0.c and unsat0.c test cases working without completing the entire engine.
The first decision you'll need to make is how to construct symbolic values for the inputs to the function (the first TODO in se.py). Z3 supports several theories and therefore several types of values, including Int, BitVec, Real, Float, etc. You probably want to choose one that closely resembles the behavior of an unsigned int in C, since all our values are unsigned ints.
You can run a single C file through the engine by running python se.py path/to/file.c. This will also print out the AST for the C program, which might be useful to identify how to traverse parts of the AST.
Make sure that every branch of execute_node that you implement eventually returns a list of program paths, otherwise you will get strange failures. Some of these branches will modify the list of program paths, including possibly adding or replacing paths, while others will return it unchanged.
I recommend these slides by Emina Torlak for a visual introduction to symbolic execution.
Many resources for symbolic execution will talk about dynamic execution or about checking the feasibility of paths during the execution process. These optimizations help scalability by pruning paths when they are provably unreachable, so we don't waste time executing code that can't be reached. Our tests are simple enough that there's no need to implement these optimizations, so they're not required here. The SymbolicExecutor invokes the solver only once, at the conclusion of the entire execution.

Z3 Python primer

We'll be using the Python bindings to the Z3 SMT solver to check whether a path to abort() is reachable. The code to invoke the solver is already provided in the execute method. However, you'll need to manipulate Z3 formulas and variables in a few places. The SAT/SMT by Example book by Dennis Yurichev contains many examples of using these Python Z3 bindings, as does this older guide by Z3's original author, Leonardo de Moura. Here's a crash course on how the Z3 Python bindings work (the same code is in the z3_demo function in se.py if you'd like to play with it).

The Z3 interface starts with creating variables. Z3 is an SMT solver, so we can create variables of several different types, and Z3 will use different theories to discharge them. Variable constructors also take as input a name for the variable. An Int is a mathematical integer:

x = z3.Int("x")

A BitVec (bitvector) is a machine integer--a fixed-width vector of bits, here 32 bits:

y = z3.BitVec("y", 32)

A Bool is a boolean:

z = z3.Bool("z")

Unlike a SAT solver that requires constraints to be in conjunctive normal form, most SMT solvers will accept constraints (also known as "assertions") in any format. So we can build arbitrary formulas and pass them to Z3. These formulas can also span different theories:

clause = z3.Or(x > 0, z3.And(y > z3.BitVecVal(0, 32), z))

To solve an SMT problem, we construct an instance of the solver and add assertions to it:

s = z3.Solver()
s.add(clause)

Now we can run the solver to check whether all the assertions we added to it are satisfiable:

assert s.check() == z3.sat, "the problem is satisfiable"

We can also get back a model, which gives us a satisfying assignment to each variable:

m = s.model()
print(f"x={m[x]}, y={m[y]}, z={m[z]}")

One catch to be aware of is that values in the BitVector theory aren't inherently treated as either signed or unsigned. For the operations where signedness matters, you have to choose explicitly which variant you want. For example, comparing two 16-bit bitvectors:

x = z3.BitVecVal(0b1111111111100101, 16)  # 65509 if unsigned, or -27 if two's-complement signed
y = z3.BitVecVal(0b0000000000001100, 16)
s = z3.Solver()
s.add(x > y)  # Z3 maps `>` to *signed* comparison
assert s.check() == z3.unsat, "x < y in signed comparison"
s = z3.Solver()
s.add(z3.UGT(x, y))  # "Unsigned Greater Than", the unsigned version of `>`
assert s.check() == z3.sat, "x > y in unsigned comparison"

What to submit

Submit your solutions by committing your changes in Git and pushing them to the private repository GitHub Classroom created for you in the Set up the code step. If you haven't used Git before, Chapters 1 and 2 of the Git book are a good tutorial.

The only files you should need to modify are sudoku/solve.py and se/se.py. GitHub will automatically select your most recent pushed commit before the deadline as your submission; there's no need to manually submit anything else via Canvas or GitHub.

GitHub Classroom will autograde your submission every time you push, by just running pytest on exactly the same test cases you can run locally. If you can't complete the entire homework and so the autograder fails, don't worry—I will still be grading manually for partial credit.

CS 395T: Systems Verification and Synthesis Spring 2023