Alternative test approach: Quickcheck

With this article I would like to shed some light on the question: Why should I implement tests with the specification-based testing approach proposed by Quickcheck? I will show that the approach yield tests with better readability, more test coverage, helps to omit mock tests and manually maintained test fixtures.

I'll illustrate the approach with an simple example taken from the Guava library to be able to discuss the pros and cons. The example I picked is the test of the com.google.common.base.Splitter class. I present three implementations of the Splitter tests. The test implementation as it can be found in Guava today (translated to TestNG), a test implementation with TestNG @DataProvider and an implementation with Quickcheck for Java.

The following code are tests as they are today. The tests check the behavior of simple split operations for different input values. These tests have a lot of code but are structurally very similar.

@Test public void testCharacterSimpleSplit() {
  String simple = "a,b,c";
  Iterable<String> letters = Splitter.on(',').split(simple);
  assertContentsInOrder(letters, "a", "b", "c");
}

@Test public void testCharacterSplitWithDoubleDelimiter() {
  String doubled = "a,,b,c";
  Iterable<String> letters = Splitter.on(',').split(doubled);
  assertContentsInOrder(letters, "a", "", "b", "c");
}

@Test public void testCharacterSplitWithDoubleDelimiterAndSpace() {
  String doubled = "a,, b,c";
  Iterable<String> letters = Splitter.on(',').split(doubled);
  assertContentsInOrder(letters, "a", "", " b", "c");
}

@Test public void testCharacterSplitWithTrailingDelimiter() {
  String trailing = "a,b,c,";
  Iterable<String> letters = Splitter.on(',').split(trailing);
  assertContentsInOrder(letters, "a", "b", "c", "");
}

@Test public void testCharacterSplitWithLeadingDelimiter() {
  String leading = ",a,b,c";
  Iterable<String> letters = Splitter.on(',').split(leading);
  assertContentsInOrder(letters, "", "a", "b", "c");
}

@Test public void testCharacterSplitWithMulitpleLetters() {
  Iterable<String> testCharacteringMotto =
     Splitter.on('-').split("Testing-rocks-Debugging-sucks");
     assertContentsInOrder(
      testCharacteringMotto, "Testing", "rocks", "Debugging", "sucks");
}
 
private void assertContentsInOrder(Iterable<String> actual,
     String... expected) {
  assertEquals(Arrays.asList(expected), Lists.newArrayList(actual));
}

From the readability perspective the problem with tests specifying concrete values is that the reader has to infer which part of the information is significant and which part is not. For example taking this test in isolation:

@Test public void testCharacterSimpleSplit() {
  String simple = "a,b,c";
  Iterable<String> letters = Splitter.on(',').split(simple);
  assertContentsInOrder(letters, "a", "b", "c");
}

One could come to the result that:
- split works only with one, alphanumeric character (“aa,bb” would not be a legal input string)
- the separator may not be an alphanumeric character
- splits works only for successive char values as the input is a - b - c
- all other characters are undefined

The reader of the tests has to digest all tests to be able to infer the actual properties of the Splitter. While reading he has also to remove the insignificant information and has to eliminate the invalid assumptions. (Actually, the tests say nothing about alphanumeric separators. So this assumption is still open.) As we know what split does this is trivial but for unknown functionality this is much harder.

The test can be implemented with the TestNG DataProviders in a more concise way:

static final String SPLIT = "split";

@DataProvider(name = SPLIT)
public Object[][] split() {
  Object[] simpleSplit = {"a,b,c", ",", new String[] { "a", "b", "c" } };
  Object[] splitWithDoubleDelimiter = {
    "a,,b,c", ",", new String[] { "a", "", "b", "c" } };
  Object[] splitWithDoubleDelimiterAndSpace = {
    "a,, b,c", ",", new String[] { "a", "", " b", "c" } };
  Object[] splitWithTrailingDelimiter = {
    "a,b,c,", ",", new String[] { "a", "b", "c", "" } };
  Object[] splitWithLeadingDelimiter = {
    ",a,b,c", ",", new String[] { "", "a", "b", "c" } };
  Object[] splitWithMulitpleLetters = {
    "Testing-rocks-Debugging-sucks", "-", new String[] { "Testing", "rocks", "Debugging", "sucks" } };
  return new Object[][] {
    simpleSplit, splitWithDoubleDelimiter, splitWithDoubleDelimiterAndSpace,
    splitWithTrailingDelimiter, splitWithLeadingDelimiter, splitWithMulitpleLetters };
}

@Test(dataProvider = SPLIT)
public void simpleSplit(String simple, String separator, String[] expected) {
  Iterable<String> letters = Splitter.on(separator).split(simple);
  assertEquals(Arrays.asList(expected), Lists.newArrayList(letters));
}

This test removes a lot of the boiler-plate in the original version. It tests the split method repeatedly with a set of input values. The more compact representation of the test helps as long as we are keeping test input data and the test in one place. (If you organize your source file in a way that the test data and test is separated the readability decreases significantly.) This test approach still suffers from the readability problems the initial test has: the reader has to derive the property of the Splitter implementation from the input and expected return values. This is like inferring the properties of multiplication from a set of input and return values.

An completely different approach can be implemented using Quickcheck for Java for API. Quickcheck is build on the basic abstraction of the Generator<T>. The only method of the Generator<T> interface is next(). It returns a new value of type T.

The Splitter test uses Quickcheck functionality to create values:
- PrimitiveGenerators.strings() is a Generator<String> for undefined string values
- CombinedGeneratorIterators.someNonEmptyLists() is similar to a Generator<List<T>> but it is adapted to Iterable<List<T>> to be able to use it in a for-each loop. The lists returned have at least one element.

Basic idea is to specify a test as follows: If you take any non empty list of words, join them with a distinct separator and then split the joined string the result will be the input list of words again.

@Test public void simpleSplit() {
  for (List<String> words : someNonEmptyLists(strings())) {
    char separator = anyDistinctCharacter(words);
    String input = Joiner.on(separator).join(words);
    Iterable<String> letters = Splitter.on(separator).split(input);
    assertEquals(words, Lists.newArrayList(letters));
  }
}

The test uses the anyDistinctCharacter helper method to create separators. Separators have to be distinct from the characters used in words otherwise the splitter splits words. The helper uses the PrimitiveGenerators.characters() factory method to create a Generator<Character> for arbitrary characters. To create the separator values CombinedGenerator.excludeValues generator is used. It is build on a input generator but skips all generated values that are equal to the excluded values. (It skips all characters that are in words.)

private char anyDistinctCharacter(List<String> words) {
    char[] notAllowedAsSeperator = Joiner.on("").join(words).toCharArray();
    return excludeValues(characters(), asList(notAllowedAsSeperator)).next();
}

This specification-based test describes the contract and does not enumerate the specific test values. Compared to the two other implementations the way the test is written is more rigid. This is a positive thing, as I already tried to explain that when composing a test of concrete values there is place for misinterpretations. The other effect is that six tests boil down to one test.

This test approach has of course it's trade-offs: it depends on join and is not deterministic.
- Join has to be tested so that this test is working. I think the join test is much easier than the split test. Once join is tested we can depend on it. By testing join/split together we can check that these two are symmetric (f(f-1(x)) = x). Testing symmetry comes for free with this implementation approach.
- The real concern is that we will run the test with randomly generated test data and it is not guaranteed to contain all of the enumerated test cases in every test run.

If non-determinism is your concern there are some ways to address this.
- You can use a different distributions function in your generators. Using different distributions will make certain values more or less likely to occur.
- You can make the values used deterministic by using the CombinedGenerators.ensureValues(ensuredValues). This is still better than the tests with concrete values as you only have to specify the input values and not the expected result values.
- You can combine deterministic and random values. The generator created with CombinedGenerators.ensureValues(ensuredValues, otherValues) uses the enumerated values first and then the otherValues generator after the explicit values are exhausted.
The random nature of the generated values has of course an advantage: you may find bugs that you did not anticipate while writing the test.

Primitive type, enumerations and collection generators alone are not very useful. To test a real world application you have to be able to create values of your custom types. In the Splitter example we already used generators provided by the Quickcheck implementation.

Here is a simple example to get you started.

class Name {
  private final String first;
  private final String last;

  public Name(String first, String last) {
    super();
    this.first = first;
    this.last = last;
  }
  public String getLast() { return last; }
  public String getFirst() { return first; }
}

class NameGenerator implements net.java.quickcheck.Generator<Name>{
  Generator<String> first = PrimitiveGenerators.strings();
  Generator<String> last = PrimitiveGenerators.strings();

  @Override public Name next() {
    return new Name(first.next(), last.next());
  }
}

The code defines the custom type Name and a Generator<Name> using generator implementations provided by the Quickcheck framework. This is analogous to the definition of types in Java you define your own types based on the primitive and other types provided by the JDK.

The test then checks one property of the Name type. This test fails with the current implementation of the Name but this is easily fixed.

public class NameTest {
  @Test public void equals(){
    for(Name name : Iterables.toIterable(new NameGenerator())){
      assertEquals(name, new Name(name.getFirst(), name.getLast()));
    }
  }
}

The test uses an adapter to the Iterable interface to be able to use the generator in a for expression. This is one way to run the test. You can also use an inner class with a more capable runner (QuickCheck.forAll(generator, characteristic)). Using the for expression is the trade-off given that the Java language has no closures yet.

Analogous to your domain object hierarchy you can now build a generator hierarchy that is used to create the domain objects. This clearly adds some overhead as you now have an additional generator for every domain object. But this pays off as you can:
- use generators in unit tests.
- use generated values to replace hard coded fixtures in integration tests (for example tests that you dbunit).
- omit mock tests that are only in place because the object graph to start a test is to hard to setup manually
- use generated values to write inverse functions (f-1(f(x)) = x) and analogous functions (f(x) = g(x) where f or g is simpler) tests (for example zip, encryption, read/write, sorting, searching, etc.).

I hope I could convince you that the approach helps you with writing more expressive tests by leveraging some computing power. You can use specification-based testing approach to test a range of software from the boring business software to more interesting areas like data-structures and algorithms. Maybe you could start by writing a test for the join method using Quickcheck?

Thomas Andreas Jung's Blog

20101108