**Exercise 2** *for [CS 361S](.)* *Due: Friday, February 9th, 11:59 PM* # Goal The goal of this exercise is to understand how GCC implements stack canaries on RISC-V. Along the way, we'll also get practice with GDB and with RISC-V assembly. # The Environment We will use the class VM you installed in [exercise 1](ex1.html). You'll also want to grab a copy of the glibc (GNU C standard library) sources. You can download a source tarball from [the GNU FTP server](https://ftp.gnu.org/gnu/glibc/glibc-2.37.tar.gz). From within the VM, you can also easily obtain the glibc source exactly as compiled to make the Debian package. First, as root, edit the `/etc/apt/sources.list` to add the line ``` deb-src [arch=riscv64] http://deb.debian.org/debian unstable main ``` (this is the same as the line already present, but with first word `deb-src` instead of `deb`). Then, as root, install the `dpkg-dev` package, using the command `apt install dpkg-dev`.Now, as the non-root `debian` user, you can fetch the source code for the `libc6` package using the command `apt-get source glibc`. (No need to actually build glibc, we'll just be doing some code searches. Now may be a good time to look into code-searching tools! A good simple option is a fast [language-aware filetree grep](https://beyondgrep.com/feature-comparison/) tool, such as `ack`, `ag`, and `rg`---on Debian, these are the `ack`, `silversearcher-ag` and `ripgrep` packages.) # The Exercise Download [`ex2.tar.gz`](ex2.tar.gz) in your course VM and unpack it. In the `ex2/` subdirectory, you will find a simple vulnerable program, `ex2.c`, and a Makefile that builds it with (`ex2-canary`) and without (`ex2-nocanary`) stack canaries. The Makefile also arranges to save the assembly output of GCC in `ex2-canary.s` and `ex2-nocanary.s`, so you can see what GCC produced before the assembler and linker have mauled it. Running `make` in the `ex2` directory should build everything. ## Stack canary prologue and epilogue Compare the `.s` files produced with and without canaries. Which instructions were added to the `buffer` function prologue and epilogue? (Why was the `nobuffer` function unchanged?) Place the added prologue instructions in `prologue.txt` and the added epilogue instructions in `epilogue.txt`. Format them just as they are in the `.s` file, including indentation. Don't include lines that were incidentally changed because stack offsets for other parts of the `buffer` function's activation record changed---just the instructions that add new functionality. ## When the ~~levee~~ canary breaks What happens if the canary check fails? You can test just by running the program with a sufficiently long argument. The old-school hacker way to supply a long argument on the commandline is a Perl expression in backticks, ``` [some-prog] arg1 arg2 `perl -e 'print "A"x200'` ``` but you can also just mash some keys if you prefer. You should see a one-line error message printed out by the program (and possibly additional diagnostic information printed out by your shell). Now, with GDB, trace execution of the failed check all the way to program exit. You should see three four functions execute, one calling the next. (Step over intermediate function calls to string utilities, we're interested just in the very last call made by the first, second, and third functions). The third of these functions will call `abort`, which has a variety of methods to end the process, the one that seems to be used involves a couple more function calls and, eventually, a `tgkill` system call. Place the one-line error message and the names of the three functions that are called in succession when the canary check fails in `fail.txt`, not including `abort` and the functions it calls. This file should have exactly four lines. ## The origin of the canary Now, let's investigate how the canary gets its value. Confirm, for example with GDB, that the canary is different on different program executions. In the RISC-V GNU toolchain we are using, the canary value is stored in a global variable (that is normally a symbol exported by libc.so, but our Makefile specifies static linking to make your life a bit easier). What is that variable called? More importantly, how does that variable get initialized? The initialization happens in different ways on different platforms, so it's hard to search for in the glibc source. Instead, [set a watchpoint](https://sourceware.org/gdb/onlinedocs/gdb/Set-Watchpoints.html) on the variable name in GDB before letting `ex2-canary` proceed past `_start`, then continue program execution until your watchpoint fires. (It'll take a little while to get there! Normally, watchpoints have hardware support that makes them fast, but our Qemu process emulation doesn't provide that feature.) Note that GDB will insist that you supply a type (using C typecast notation) for the canary variable, since it needs to know exactly how many bytes in memory to watch. The correct type is `unsigned long`. Which function triggers the watchpoint? Disassemble it and look at the instructions immediately before `pc` (the ones above the line marked `=>` in the margin) to see what *other* variable it's copied from. We could continue with GDB, but now we have enough information to find the interesting bits in the glibc source that you have available. Let's start using `rg` (or whatever code search tool you've settled on). Find the actual function in the glibc source that sets the canary variable from the buffer it's copied from. Note that the function name you saw in gdb is `#define`d so the name won't appear verbatim right above the function body in the glibc source. This function, in turn, calls another function (defined in two header files---the version relevant to us is the Linux-specific version in `sysdeps/unix/sysv/linux`) to do the actual copying. What's up with the endianess-specific bit manipulation? Finally, let's look at where the-buffer-copied-into-the-canary-variable is initialized. Again, you'll want to look at the Linux-specific version. What data structure is parsed to set this buffer? (You can check your work by going back to GDB and typing the command `info auxv`, then looking at memory!) Did we say "finally"? Not quite. Let's find out how those bytes got there. Visit the [Linux kernel cross-reference](https://elixir.bootlin.com/linux/latest/source), search for the auxv entry name, and find the kernel function that sets this entry for a new process. What function does the kernel use to generate the random bytes? We expect you to understand all of the steps above, but, for simplicity, we'll ask you to turn in just a few of them. In `vars.txt`, put the name of the variable used to intialize the `buffer` function's stack canary on the first line, the name of the variable it's copied from on the second line, and the `auxv` expression the second variable is copied from on the third line. (The third line is going to involve square brackets, the first two lines will just be one C symbol name each.) In `fun1.txt`, copy the complete source code (from the glibc-2.37 source tree you searched above) for the function that sets the canary variable whose name is the first line of `vars.txt`. Start from the line that reads `STATIC int` and continue through the line with the closing brace. In `fun2.txt`, copy the complete source code for the helper function that the function in `fun1` calls to actually copy the value from the variable in the second line of `vars.txt`, the one we saw do the interesting bit twiddling. Start from the `static inline ...` line, continue through the line with the closing brace. In `fun3.txt`, copy the complete source code for the setup function that initializes the variable in the second line of `vars.txt` from the expression in the third line of `vars.txt`. Start with the two-line comment before the `static inline` line, continue through the closing brace. # Logistics You will submit using Gradescope. You should submit a zip file of your solution, without directory structure. Your solution should include at least the following files: * `prologue.txt` and `epilogue.txt`, the assembly prologue and epilogue added to implement stack canary checks. * `fail.txt`, the error message printed and the names of the three functions that execute when a stack canary check fails. * `vars.txt`, the two variable names and the C expression you saw when tracing the source of canary values. * `fun1.txt`, `fun2.txt`, and `fun3.txt`, the three C functions in glibc that initialize the variables in `vars.txt`. # Grading The exercise will be graded out of 7 points: 1 each for correct answers in the seven files above.