Instead of describing some synthetic examples I can now present the pattern with a real world example. I wrote tests for a Java port of a small C library. The phonet library returns the same string for words with the same pronunciation in German. In this way it’s similar to the Soundex and Metaphone algorithms in English.
The test, I’m going to present, found bugs in the Java port. More surprisingly it also found bugs in the C library (that’s more than 10 years old) and the JNI-bridge code we used for years.
If you have a look at the C code you’ll find out that it is not trivial. Well, that’s the most polite way to say that it’s devilish hard to read. It’s beyond my means to read a code that initializes 10 static variables and has 26 automatic variables in the main function. It’s not obvious how you can port this code to Java. Luckily, there was already a Java port that is more or less analogous to the C code - it’s not very idiomatic Java.
My goal was to fix the port. I ignored performance and maintainability issues.
How who you figure out that the quite complex function behaves the same way in the original and the ported version?
The trivial test approach to explicit check values is costly and it’s hard to get good code coverage.
assertEquals("a", phonet("a"));
This way you need a lot of test code before you get some coverage. Line and branch coverage might be easy to attain, but the phonet library has 200+ rules that are evaluated by precedence. Now are we going to write x * 200 tests to check that the right values are returned? If we check all rules there are still all characters left that have to be processed consistently. When we are done with that phonet rules could change and we have to redo some of them. Not very exciting.
The trivial test approach is simply too expensive (and boring). Let’s try another approach that reduces our work to one line of code: write the test as an analogous function test.
The analogous test ensures that f(x) = g(x) for all x. f(x) is the C version and g(x) is the Java version. The only thing left is to generate x and see if f(x) == g(x) holds.
That’s the single line of code we have to write to get the test coverage. Okay, we have to write one line to check that f(x) = g(x) and some code to generate x. It all boils down to comparing the output of the C and the Java versions.
assertEquals(in, cPhonet.phonet(in), jphonet.code(in));
Unfortunately, we cannot create all values with brute force. The challenge is to find ways to generate the right input values and the quality of the test depends on the generated input values. I wrote the same test with multiple approaches each and every one useful for a facet of input values:
- There’s a Quickcheck-based random generator test for large input
- There’s a test with German dictionaries of known values
- There’s a exhaustive test that generates all possible values up to a certain (small) size
@Test public void phonetDefinedCharacters() { Coder jphonet = createCoder(); CPhonet cPhonet = new CPhonet(); for (int size = 1; size <= 2; size++) { for (String in : kPermutationWithRepetition( CharEncoding.allChars(), STRING_ADD, size)) { assertEquals(in, cPhonet.phonet(in, rules()), jphonet.code(in)); } } }
You can have a look at the full implementation in the CPhonetVsPhonetikTest class.
As you can see from the test code the idea is simple. Following this approach will not guarantee 100% test coverage as long as you cannot generate all values. The hard part is to come up with the right test data. A pure Quickcheck-based approach is a good start, but might need to run too long to find critical bugs and additional tests are necessary.
Porting is only one class of tests to use the analogous function testing for. Another class are tests where you already have a simple implementation. You are able to verify the correctness of the simple code easily. The simple implementation is then used to test the harder implementation (with preferable non-functional characteristics). It might be a good approach if you do not trust a piece of code, but you’re unable to read all of the source code. Just check it against a simple implementation.
A blind spot of an analogous function test is that bugs in f(x) and g(x) can mask each other. If you implement the same bug in both versions the test will not indicate the problem.
The moral from this story is that black-box tests with generated values can be a valuable tool besides classical tests and code reviews - but as everything in life it’s not perfect. It’s only one tool. Quality is still expensive.