20121219

Analogous function testing

For a long time I wanted to write about a testing technique I call analogous function testing. Some time ago I already wrote about the inverse function test pattern and the analogous function testing pattern is mentioned in my piece about functional hash functions.

Instead of describing some synthetic examples I can now present the pattern with a real world example. I wrote tests for a Java port of a small C library. The phonet library returns the same string for words with the same pronunciation in German. In this way it’s similar to the Soundex and Metaphone algorithms in English.

The test, I’m going to present, found bugs in the Java port. More surprisingly it also found bugs in the C library (that’s more than 10 years old) and the JNI-bridge code we used for years.

If you have a look at the C code you’ll find out that it is not trivial. Well, that’s the most polite way to say that it’s devilish hard to read. It’s beyond my means to read a code that initializes 10 static variables and has 26 automatic variables in the main function. It’s not obvious how you can port this code to Java. Luckily, there was already a Java port that is more or less analogous to the C code - it’s not very idiomatic Java.

My goal was to fix the port. I ignored performance and maintainability issues.

How who you figure out that the quite complex function behaves the same way in the original and the ported version?

The trivial test approach to explicit check values is costly and it’s hard to get good code coverage.

assertEquals("a", phonet("a"));

This way you need a lot of test code before you get some coverage. Line and branch coverage might be easy to attain, but the phonet library has 200+ rules that are evaluated by precedence. Now are we going to write x * 200 tests to check that the right values are returned? If we check all rules there are still all characters left that have to be processed consistently. When we are done with that phonet rules could change and we have to redo some of them. Not very exciting.

The trivial test approach is simply too expensive (and boring). Let’s try another approach that reduces our work to one line of code: write the test as an analogous function test.

The analogous test ensures that f(x) = g(x) for all x. f(x) is the C version and g(x) is the Java version. The only thing left is to generate x and see if f(x) == g(x) holds.

That’s the single line of code we have to write to get the test coverage. Okay, we have to write one line to check that f(x) = g(x) and some code to generate x. It all boils down to comparing the output of the C and the Java versions.

assertEquals(in, cPhonet.phonet(in), jphonet.code(in));

Unfortunately, we cannot create all values with brute force. The challenge is to find ways to generate the right input values and the quality of the test depends on the generated input values. I wrote the same test with multiple approaches each and every one useful for a facet of input values:
  • There’s a Quickcheck-based random generator test for large input
  • There’s a test with German dictionaries of known values
  • There’s a exhaustive test that generates all possible values up to a certain (small) size
As an example I’ll show you the exhaustive test for small strings. It uses Permutation.kPermutationWithRepetition(all, STRING_ADD, size) to create an Iterable that returns all permutations of size k from a set of input values with repetition of values (i.e. for the input AB and the size k = 2 it generates AA, AB, BA, BB).
@Test public void phonetDefinedCharacters() {
    Coder jphonet = createCoder();
    CPhonet cPhonet = new CPhonet();

    for (int size = 1; size <= 2; size++) {
        for (String in : kPermutationWithRepetition(
                CharEncoding.allChars(), STRING_ADD, size)) {
            assertEquals(in, cPhonet.phonet(in, rules()), jphonet.code(in));
        }
    }
}

You can have a look at the full implementation in the CPhonetVsPhonetikTest class.

As you can see from the test code the idea is simple. Following this approach will not guarantee 100% test coverage as long as you cannot generate all values. The hard part is to come up with the right test data. A pure Quickcheck-based approach is a good start, but might need to run too long to find critical bugs and additional tests are necessary.

Porting is only one class of tests to use the analogous function testing for. Another class are tests where you already have a simple implementation. You are able to verify the correctness of the simple code easily. The simple implementation is then used to test the harder implementation (with preferable non-functional characteristics).  It might be a good approach if you do not trust a piece of code, but you’re unable to read all of the source code. Just check it against a simple implementation.

A blind spot of an analogous function test is that bugs in f(x) and g(x) can mask each other. If you implement the same bug in both versions the test will not indicate the problem.

The moral from this story is that black-box tests with generated values can be a valuable tool besides classical tests and code reviews - but as everything in life it’s not perfect. It’s only one tool. Quality is still expensive.

20121212

Testing: Start with the result

If you write tests based on generated values, it’s incredible how often it is easier to start from the result value and work your way backwards to the input value (that results in return value).

One part of the story is that you should not repeat the algorithm in the test. The only thing you’re testing is that you can write the same code in two places. Every error in your reasoning is repeated. It’s natural to start with a known return value and create input values that lead to this result.

I’ll demonstrate the principle with a functionality I recently used: k-permutation with repetition. (k defines the size of the output values. Permutation indicates that we care about order. Repetition says that output values can be created by reusing input values.) For example the k-permutation with repetition for the input set {A B} and k = 3 is {AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB}. We know that the size of the permutation is n^k. (n is the size of the input set.)

@Test public void kPermutationWithRepetitions() {
    int maxSize = 6; 
    
    for(String expected : toIterable(strings(1, maxSize))) {
        Set<Character> allowed =
            newHashSet(Chars.asList(expected.toCharArray()));
        Set<Character> additional =
            anySet(characters(), 0, maxSize - allowed.size());
        allowed.addAll(additional);
    
    
        int k = expected.length();
        Iterable<String> actual =
            kPermutationWithRepetition(allowed, STRING_ADD, k);
        
        assertEquals(
            Math.pow(allowed.size(), k), Iterables.size(actual), 0);
        assertTrue(Iterables.contains(actual, expected));
    }
}

The test uses Quickcheck and Guava. It starts by generating one expected result value. The interesting input value is the set of allowed characters. This set contains all characters of the result string plus some additional characters. We have to take care that the input set is small enough, otherwise the result size is too large to run the test in practice. With this in place we can call kPermutationWithRepetition that generates the actual result. The test checks that the expected value is in the result and that the size of the permutation is correct.

This example demonstrates that it can be much easier to start with an expected return value. Depending on the situation it’s appropriate to create the entire result or just aspects of it. If you can generate at least one result value, the probability of the generated values is evenly distributed over all results and you know the total result size, your test is sufficient.