Test-Driven Development and formal proofs

Since last year I got interested in Test-Driven Development. The idea of first writing a test before implementing new functionality appeals to me, because it ensures you ensure 100% code coverage. And having a lot of test cases makes it safe to refactor your code later. Refactoring code is needed to keep your code clean.

However, tests can only show the presence of bugs, not their absence: it is possible that your code contains bugs even if all your tests pass. You can only proof that your code has no bugs using tests if you write tests that cover the complete state space of your application. This is practically infeasible. For example, for a sorting algorithm that sorts arrays of numbers, you need tests for each array length, and for each array length you need tests for all possible values of each array element. Even if you succeed in creating all these tests, executing them will take centuries!

Correctness proofs

To actually proof the correctness of a sorting algorithm, you only need a few tens of lines of formal proof. That is what I learned at the Eindhoven University of Technology. I have studied the techniques introduced by prof. E.W. Dijkstra to proof the correctness of algorithms.

Proving an algorithm is a time-consuming and difficult job. It is possible that you make a mistake in your proof without noticing it. You might convince yourself about the correctness unjustified. Therefore I write tests for my code just in case I made a mistake in my formal proof.

Transformation Priority Premise

Robert Martin (Uncle Bob) makes an interesting statement in his blog The Transformation Priority Premise. In this blog he writes how he constructs an algorithm step by step. Each step the algorithm becomes more generic. He makes the following statement: “The sequence of tests, transformations, and refactorings may just be a formal proof of correctness.”

Combining Transformation Priority Premise and formal proofs

The steps described by Martin in themselves have nothing to do with formal proofs of correctness as taught by Dijkstra. But maybe it is possible to combine the two techniques. Maybe the Test Drive Design promoted by Martin helps creating a formal proof in Dijkstra’s style step by step.

As an exercise I tried to formally proof the algorithm made by Martin in the primes kata. The goal of this kata is to create an algorithm that determines the prime factors of a natural number (larger than one). Here is the final algorithm derived by Martin:

public List<Integer> generate(int n) {
    List<Integer> primes = new ArrayList<Integer>();
    for (int candidate = 2; n > 1; candidate++)
        for (; n%candidate == 0; n/=candidate)
return primes;

It was quite easy to proof the correctness for the first few steps. But as soon as an if-statement is replaced by a loop in the algorithm, the formal proof becomes complex.You have to introduce an invariant for the loop and also proof that the loop is guaranteed to terminate.

While I was proving the final version of Martin’s algorithm I got worried. How could seven tests convince him that the algorithm was correct for all possible inputs? And how could seven tests convince him that the two nested loops would eventually terminate for each possible input? You need knowledge about prime numbers to proof this.

So, I did not succeed in combining formal proofs with the sequence of tests, transformations and refactorings proposed by Martin. Perhaps someone else succeeds in combining these two methods. Perhaps this is an interesting subject for a final thesis of a student or a PHD student?


I believe formal proofs are the best way to convince yourself about the correctness of an algorithm. Because constructing proofs is difficult and time-consuming, tests help in getting confidence in the correctness of your algorithm. I prefer combining both techniques to get the most confidence in my algorithms.

Addition made at November 19, 2012:

Robert Martin published a beautiful comic-like explanation about how applying the Transformation Priority results in the Quicksort algorithm. I was not convinced about the correctness because of the 7 simple test cases supplied by Martin. And indeed, the resulting algorithm is incorrect! The simple test [1,1] -> [1,1] fails, because the final implementation returns [1]. The variable EQ should have actually been a list, just as LT and GT.

Actually, the algorithm is also incorrect because the first test ([] -> []) fails at the statement EQ = L[0].

The correct algorithm looks like this:

    IF L == [] RETURN L
    LT = LIST
    EQ = LIST
    GT = LIST
    FOR E IN L
        IF E < L[0] LT.ADD(E)
        IF E == L[0] EQ.ADD(E)
        IF E > L[0] GT.ADD(E)