Wed, Jan 30, 2013

Last year I got interested in Test-Driven Design (TDD). I practice TDD every now and again during my work. In my spare time I tried to solve a few katas using TDD. Then I studies other people’s solutions to the same kata and discovered most of them were incorrect!

The first steps of each kata are trivial. Implementing a method that returns null, then replacing null by a constant and then inserting an if-statement. And then I get stuck. Now its time to really think about the problem and figure out a solution.

Having been educated in using formal methods to reason about problems and to formally derive an algorithm, I decided to solve one kata not using TDD. The kata I chose was the Potter Kata. It turned out to be an interesting mathematical problem.

The Potter Kata is about five different books. The price of a single book is 8 EUR. For two different books one receives a discount of 5%. For three different books the discount is 10%. For four different books the discount is 20% and for five different books the discount is 25%. The problem of the Potter Kata is to calculate the minimum price for a list of books. Each type of the five books might occur zero, one or more times in this list.

For simplicity, let us use the numbers 0 to 4 to represent the 5 books. A 5-tuple can represent the list of books.
For example, the list `[0, 0, 1, 1, 2, 2, 3, 4]`

can be represented by the 5-tuple `(2, 2, 2, 1, 1)`

.
The first element of the 5-tuple contains the number of occurrences of book 0, the second element contains the number
of occurrences of book 1, and so on.

The function `price`

determines the minimum price for a given 5-tuple. For example, `price(1,0,0,0,0) = 8`

and
`price(0, 1, 1, 0, 0) = 15.2`

.

If the 5-tuple consists of only zeroes and ones, then the price can be determined by counting the number of ones. Multiply the number of ones by eight and apply the appropriate discount.

It gets complicated when the 5-tuple contains a number larger than one. In that case we have to split the books in
two or more subsets. In the end, each of the subsets can be represented by a 5-tuple containing only zeroes and ones.
How many subsets with only zeros and ones are there? How many combinations of 5 zeroes and ones are there? 2^5 = 32.
This includes the 5-tuple `(0, 0, 0, 0, 0)`

. This empty subset can be omitted. Hence, 31 non-empty subsets are
to be considered.

So, `price(a,b,c,d,e) = min( (price(a-1,b,c,d,e) + price(1,0,0,0,0)), (price(a,b-1,c,d,e) + price(0,1,0,0,0)), ..., (price(a,b,c,d,e-1) + price(0,0,0,0,1)) )`

Of course the terms where the 5-tuple contains negative values must be omitted. These cases are the cases where
`(a,b,c,d,e)`

do not contain a specific subset.

A straightforward implementation will consume a lot of time: `price(2, 2, 2, 2, 2)`

will evaluate price(2, 2, 1, 1, 2)
two times and price(2, 2, 0, 0, 2) six times. By applying memoization this problem is solved: simply store the result
for each 5-tuple once it has been evaluated. For the 5-tuple `(a,b,c,d,e)`

the price function is evaluated at
most `(a+1)*(b+1)*(c+1)*(d+1)*(e+1)`

times.

Once I figured this out I started writing the code. Some parts I wrote using TDD, but for large parts I wrote the production code first and test afterwards. The tests gave me the opportunity to refactor the code. My solution can be downloaded here.

To represent a 5-tuple I started writing my own class. In my implementation the size of the tuple (in this case: 5)
had to be passed to the constructor. When I tried to remove the dependency on this number, I discovered that
I actually implemented a simple multiset. At that time I looked for an existing implementation and decided to use
the `Multiset`

from Google’s Guava library.

After my implementation was finished I became curious. How would other people have solved this kata? Many solutions can be found on the Internet. I studied four solutions written in Java or C#:

- https://groups.google.com/forum/?fromgroups=#!topic/software_craftsmanship/e_If0tXusyQ (The one in C# from SimoneB.)
- http://chrismcdermott.blogspot.nl/2007/01/harry-potter-kata.html
- https://bitbucket.org/cletourneau/harry-potter-kata/src
- http://geekswithblogs.net/thomasweller/archive/2009/11/08/solving-katapotter-or-what-kind-of-developer-are-you.aspx

The first three solutions turned out to be incorrect! Most of these solutions failed for the list
`[0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 4]`

, which should result in the price 78.8. Also
the list `[0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4]`

was troublesome. It should result 100 EUR.
Simone fixed her solution after I pointed out these test cases.

I get the impression that many people stop thinking about the problem too soon. They do not fully understand the problem and decide too early to stop writing tests.

I conclude that test-driven design does not automatically lead to a correct solution. Full understanding of the problem is required to produce a correct solution. I am still not convinced why I should use TDD for every line of code I write. For now, I stick to a combination of Dijkstra’s formal proofs and TDD.