Proving the Coding Interview II

28 Apr 2022 | toString() 2 furious

This is the second of a three part series: Part 1 | Part 2 | Part 3

Welcome back to Proving the Coding Interview! In the previous article we started to verify and implement an implementation of Fizzbuzz in the Dafny programming language. But we left off one important piece: Dafny doesn’t yet have built-in functionality to convert a number value to a string, which we need to complete Fizzbuzz()’s behaviour. We’ll attempt to write such a conversion today and fix up some implementation details in the next and final post.

As it happens, I’ve also been asked to do this in a programming interview, so we’re getting two practice problems for the price of one here!

We’ll also take this opportunity to dig a bit deeper into the philosophy of formal verification. Does formal verification supersede traditional testing methodologies? Does “verified” actually mean anything, anyway? Should we “verify” anything and everything we write? Stay tuned…

Our Problem statement:

As before, let’s begin by writing out the concrete problem that we are going to solve:

To convert a number `n` into a string:

1) If `n` is less than 10, we know only one digit suffices to represent it, so
just return the one-character string containing that digit and we are done.

2) If `n` is equal to or greater than 10, then more than one digit is needed; so:
    a): first compute the string for all but the 1's place,
    b): then compute the one-character string for the 1's place,
and then concatenate the two together into the final sequence.

This is a recursive problem statement! It isn’t exactly obvious how we could write this with the kind of logical statements we used last time. The thing we’d kind of like is to be able to hand Dafny a specification that’s structured like this problem statement.

Dafny Functions are Ghost Code

So far, we’ve only written Dafny methods - as we saw, a method compiles down into executable code, and can do everything that we’d expect from a general-purpose programming language: it can call other methods, loop, recur, declare local variables, modify state, allocate objects on the heap, perform side-effects, whatever you like. This means that somewhere inside our Fizzbuzz.dll file we have .NET bytecode generated for Fizzbuzz() but not any of the requires or ensures clauses.

A Dafny function, however, is a different beast: it’s an example of ghost code. In formal verification, ghost constructs exist only as dialogue between the programmer and the verifier, and get omitted from the final compiled binary. Remember from last time that pre- and post-conditions were also left out; this makes functions another way to write more elaborate specifications. (If you’ve heard of types being erased after type-checking then the notion of ghost state might feel familiar to you.)

The downside to functions is that their use is more restricted. functions can’t have side-effects since they never execute at runtime. Conversely, outside of specification clauses like requires, ensures, and invariant, which themselves are ghost constructs, functions can’t be called in a method. In that sense, ghost code and real code really do occupy distinct universes.

Our string-to-digits specification function

Here’s my attempt at translating the English-language problem statement into Dafny functions.

// Specifies how to convert a natural number to a string
function ntos(n: nat): string
    ensures forall i :: 0 <= i < |ntos(n)| ==> '0' <= ntos(n)[i] <= '9'
{
    if n < 10
    // Base case: A nat on [0, 10) is just one character long.
    then [dtoc(n)]
    // Inductive case: Compute all but the last character, then append the final one at the end
    else ntos(n/10) + [dtoc(n % 10)]
}

// Specifies how to convert a base-10 digit to an ASCII character
function dtoc(n: nat): char
    requires 0  <=      n  <=  9
    ensures '0' <= dtoc(n) <= '9'
{
    '0' + n as char // `as` is the type-casting operator
}

Just like how loops need to provably terminate, recursion in Dafny must reach a base case too. If we left off the single-digit case in ntos() or recurred with a non-decreasing argument Dafny would reject this. (I’ll stick with the Dafny convention seen elsewhere that function names will be lowercase, whereas methods will be CamelCase.)

Specifications specify behaviour, not performance

Notice that ntos() seems higher-level and more declarative than the other Dafny code we’ve written so far. Our Fizzbuzz() method, by contrast, had a bunch of loops and picky indexing that we had to think hard about. This contrast is intentional: our goal here is to state the problem abstractly as clearly as we can, without worrying about practical concerns like performance or whether we’re using exactly the right data structures.

In a coding interview, it’s not unreasonable for the interviewee to be asked the time complexity of their solution. If we suppose that the + string concatenation operation is O(n), and if we recurred for every digit, then our solution here would be O(n * log n). We can probably imagine a different solution with equivalent behaviour and better performance (say, if we pre-allocated the structure to hold our digits in advance, avoided recursion, or used machine word-sized integers instead of BigInteger nats).

The tradeoff, of course, is that code that does more things has the opportunity to get more things wrong. But having a high-level, abstract specification function and a lower-level, concrete implementation method gives us the best of both worlds, so long as we can relate the behaviour of the two.

This is exactly where we are headed: our implementation will have an eye for runtime efficiency and other practical concerns at the expense of clarity, and Dafny will prove that whatever our implementation does is modeled by the abstract function. This reduces the amount of trust we need to have in our implementation, since now you only need to trust the high-level “easy to read” specification.

Can we really trust our friends?

A few words about ntos()’s ensures clause: This postcondition simply says that each character in the string should be a valid base-10 digit. This is of course necessary, but I can hear some of you screaming that it isn’t sufficient to prove “correctness” of this function. Don’t worry, I agree! This post-condition tells us that the returned string must be a representation of a number, but very little about how it relates to the supplied nat. A constant function that returns "42" on all inputs would satisfy this postcondition, and we definitely wouldn’t want to call that a “provably-correct solution”.

When many folks first learn about cool things like automated theorem provers or type theory, end up with the misconception that the goal is incontrovertible and exhaustive program correctness. Others – including some who responded “vigorously” on certain colour-coded tech link aggregation websites to part 1 of this series – dismiss formal methods for the opposite reason. They end up feeling that formal methods claim incontrovertible correctness, which they dismiss as impossible.

The latter people are right on one point: “total correctness”, whatever that might mean, is a bar that we shouldn’t expect to reach. We saw this last time: Our nominally-correct Fizzbuzz() was only as good as our (incomplete) specification. Whether we’re talking about our programs’ correctness, our compiler’s correctness, our compiler’s compiler’s correctness, or just life in general, we always have to trust something. Anyone who argues otherwise is probably uninformed or trying to sell you something.

It’s on all of us to treat formal methods as another powerful tool in our toolbelt, but not as something that it has no hope of living up to. Even though ntos() is getting checked by a pretty cool verifier, the fact that its specification has holes in it is a valid concern. In the same way, we shouldn’t be completely convinced by the presence of unit tests or fancy dependent types in other languages. We can use all these techniques (perhaps multiple ones together!) to increase our confidence of a correct implementation, but “total” correctness lives and will forever live at the rainbow-end of an asymptote.

That said, if we can do better then let’s do so - let’s improve our confidence in our specification by being more precise about what it should do!

One property that feels reasonable is that if we convert some nat to a string, and then somehow converted that string back to a nat, we should have the same natural number again. So, let’s write another short function that does just that!

function ston(s: string): nat
    requires |s| > 0  // Every string representation of a nat must be nonempty
    requires forall i :: 0 <= i < |s| ==> '0' <= s[i] <= '9'
{
    if |s| == 1
    // Base case: A string of length 1 only has one digit to consider
    then (s[0] - '0') as nat
    // Inductive case: Recur on all but the last character of 's', then shift the digits over for the final one
    else ston(s[..|s| - 1]) * 10 + (s[|s| - 1] - '0') as nat
}

As an example, we can see that ntos(12345) == "12345", and ston("12345") == 12345, so ston(ntos(12345)) == 12345. Our general claim here is that for arbitrary nats, ston(ntos(n)) == n. This general fact is pretty important and ought to stand on its own, so let’s factor it out into its own lemma. A lemma is akin to a “helper proof” that we can make use of in a larger context. Here’s our lemma, which checks out according to Dafny:

lemma sninv(n: nat)
    ensures ston(ntos(n)) == n
{}

Notice that the body of the lemma is empty: the only reason it exists is for Dafny to verify that its postcondition is satisfied!

Okay, could we increase our confidence further? What about proving invertibility in the opposite direction? We might say “well, by a similar argument, ntos(ston("12345")) == "12345"”, so can we write another lemma stating this as a general property too? If so, then we’ve bootstrapped two correctness proofs in terms of each other, which would be pretty cool. Let’s try that and see how far we get.

lemma nsinv(s: string)
    ensures ntos(ston(s)) == s // Will Dafny accept this without complaint?
{}

Dafny immediately starts complaining out of the gate: it tells us that “a function precondition might not hold”, and gives us the counterexample string "%". Aha, of course! In the first lemma, we implicitly used the fact that every nat has a string representation, but not every string can be converted to a nat, so this makes sense. Let’s fix this lemma with some preconditions, stating that we are only concerned with strings that satisfy the precondition for ston(). That way, we’re restricting our inputs to the space of strings that the conversion function would provably accept anyway.

lemma nsinv(s: string)
    requires |s| > 0
    requires forall i :: 0 <= i < |s| ==> '0' <= s[i] <= '9'
    ensures ntos(ston(s)) == s
{}

Is Dafny satisfied now? Interestingly, no! And the counterexample it finds is pretty awesome. Dafny gives me the string "04" as an example where ntos(ston(s)) != s, and this makes sense! The presence of leading zeroes means that nats don’t have a unique string representation: ntos(ston("04")) == "4"! There are any number of ambiguous strings, too: 4 == ston("4") == ston("04") == ston("00000004"). I wonder how many format string vulnerabilities have come about because of a tacit assumption like this?

There are ways to make more precise which of the equivalent strings will be generated, and encode that as part of our specification. But, let’s leave that discussion until the very end of the post.

How does this tie back into Fizzbuzz()?

…Hey, remember that we were trying to write Fizzbuzz() at some point?

This is where we left Fizzbuzz() off last time: remember that we didn’t have a way of specifying the “otherwise, produce the number in string form” post-condition. We simply left it out, as a result.

method Fizzbuzz(n: int) returns (ret: array<string>)
...
//ensures forall i :: 0 <= i < n ==> i % 3 != 0 && i % 5 != 0 ==> ret[i] == i // TODO
{
...
    while j < n
    ...
    //invariant forall i :: 0 <= i < j ==> i % 3 != 0 && i % 5 != 0 ==> ret[i] == i // TODO
    {
    ...
        if (curr_modified) {
            ...
        } else {
            ret[j] := "TODO: convert j to a string";
        }
    }
}

What can ntos() buy us here? Well, the specification for Fizzbuzz() ought to say that ret[i] must be the string representation of i, which we now have a way to specify! Concretely, our postcondition can state that ret[i] == ntos(i)!

method Fizzbuzz(n: int) returns (ret: array<string>)
...
ensures forall i :: 0 <= i < n ==> i % 3 != 0 && i % 5 != 0 ==> ret[i] == ntos(i) // Yay!
{
...
    while j < n
    ...
    invariant forall i :: 0 <= i < j ==> i % 3 != 0 && i % 5 != 0 ==> ret[i] == ntos(i) // Woo!
    {
    ...
        if (curr_modified) {
            ...
        } else {
            ret[j] := "TODO: convert j to a string"; // Hmm...
        }
    }
}

Of course, we’re not done yet: Dafny tells us that our loop invariant isn’t being maintained because we’re still assigning that TODO string in the implementation. What happens if we change that assignment to ret[j] := ntos(i)? Dafny tells us, A call to a ghost function is allowed only in specification contexts. This fits our understanding of ghost constructs living in two parallel worlds.

One simple solution is to mark ntos() as both a function and a method. So-called function methods can be thought of as methods whose runtime behaviour is restricted enough that they can’t perform any operation that a function couldn’t, like mutate state. Since ntos() is side effect-free and only computes a return value, we can mark it as a function method and Dafny will let us both use it in both specifications and implementations.

// Specifies how to and implements to convert a natural number to a string
function method ntos(n: nat): string
    ensures forall i :: 0 <= i < |ntos(n)| ==> '0' <= ntos(n)[i] <= '9'
{
    if n < 10
    // Base case: A nat on [0, 10) is just one character long.
    then [dtoc(n)]
    // Inductive case: Compute all but the last character, then append the final one at the end
    else ntos(n/10) + [dtoc(n % 10)]
}

method Fizzbuzz(n: int) returns (ret: array<string>)
...
ensures forall i :: 0 <= i < n ==> i % 3 != 0 && i % 5 != 0 ==> ret[i] == ntos(i) // Yay!  Calling a function method as a function!
{
...
    while j < n
    ...
    invariant forall i :: 0 <= i < j ==> i % 3 != 0 && i % 5 != 0 ==> ret[i] == ntos(i) // Woo!  Calling a function method as a function!
    {
    ...
        if (curr_modified) {
            ...
        } else {
            ret[j] := ntos(i); // Nice! Calling a function method as a method!
        }
    }
}

This change is enough to get our Fizzbuzz() verified and compiling!! Let’s run it and see how it looks.

$ ~/code/dafny/Binaries/Dafny Fizzbuzz.dfy && dotnet Fizzbuzz.dll | head -n 20

Dafny program verifier finished with 8 verified, 0 errors
Compiled assembly into Fizzbuzz.dll
fizzbuzz
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
$

Look at that! At long last, we have a working solution!

But nobody will actually do this, Nathan!

Hopefully you’ve found this a fun way to learn a bit of formal methods and a funny thought experiment. Some of the more literally-minded among you may be wondering whether I’m actually advocating for readers doing their next interview loop in Dafny. By all means go ahead, but choosing an uncommon language might be slightly-high risk for you. If I can get bounced from a phone screen for not using the interviewer’s favourite language (Ruby developers can be… a lot sometimes), you might be setting yourself up for a frustrating job hunt.

That said, I do genuinely think that there are lessons here that can apply to your next whiteboard interview, even if you’re sticking with Python or Java or whatever your favourite interviewing language is.

In both blog posts, we began thinking through the problem by way of writing a textual problem statement. This did more than just kill a few minutes of time: I really do think this helps ground the interviewee within the problem before any code actually gets written. This is also a great time to write example calls with expected outputs; we did that this time when trying to concoct a general lemma for ntos()’s relationship to ston(), for instance.

Yes, it’s true that most languages don’t have compile-time invariant checks, nor pre- and post-condition clauses. That’s okay, though - nothing stopping you from simply writing those as at-runtime assert() statements, or even just informal comment blocks. Jon Bentley’s articles on writing correct code, which he collated into his book Programming Pearls, has great examples of stating invariants in comment blocks not to prove your code correct but to guide your thinking towards correct code.

In an interview, of course, there’s no underlying formal methods magic to verify that your invariants are right, but just like how ghost code set up a dialogue between the programmer and the verifier, doing this informally sets up a similar dialogue between the programmer and interviewer. And if you’re interviewing at a more forward-thinking company where you’re given an actual text editor and unit test suite to pass, if you write them as assert()s you’ll at least have runtime failures. Those assertion failures will also likely closer to where the bug lies, giving you a leg up on fixing the issue.

Next time: a more realistic ntos()!

Using ntos() as a function method might be enough to satisfy you, or your whiteboard interviewer. But remember from earlier that we intentionally traded a bit of performance in ntos() for writing it in a clearer, more declarative style. We’re leaving some cycles on the table that we don’t necessarily have to here! Writing implementation code in Dafny shouldn’t feel like programming in a functional language (or, worse, writing proofs on a whiteboard!); it gives us loops, mutable structures, and other traditional programming language constructs, so why shouldn’t we make use of those if we can?

Next time, we’ll leave ntos() as it was as a mere function. We’ll instead port a real-world, optimised int conversion implementation from another language to Dafny and prove that it satisfies ntos(). Hopefully this should help convince you that we don’t need to just verify silly or naive problems in Dafny, but rather that it’s expressive enough for real-world software that’s both mission-critical as well as high-performance.

We’ll also extend the “…but nobody will actually do this!” criticism from specifically interview problems and discuss the practicalities of verifying your – yes, your! – production software like this.

Your turn:

Until next time, here are some ideas to get you playing around with some of the things we did today:

  • If you’re familiar with first-order logic, you might be interested in seeing precisely what formulas Dafny generates and asks its underlying prover to validate. Using Dafny’s /proverLog command-line flag, dump the formulas for ntos() and see if you can follow along with them. (Dafny will generates tonnes of output here, so it’s best to hand it a file that contains only the parts of the file you’re actually interested in.)
  • We also saw that writing the inverse composition specification was tough because of the leading zero ambiguity. Come up with a way to extend ntos() and ston()’s specifications in order to satisfy Dafny. (One idea might be to define the notion of a “canonicalized” string, which is one with no unnecessary digits, and convince Dafny that ntos() only returns canonicalized strings.)

As always, if you do, drop me a line and let me know.

Thanks!

Thanks to Sammy Thomas and Cole Vick for feedback on early drafts of this post!