Analogous function testing

For a long time I wanted to write about a testing technique I call analogous function testing. Some time ago I already wrote about the inverse function test pattern and the analogous function testing pattern is mentioned in my piece about functional hash functions.

Instead of describing some synthetic examples I can now present the pattern with a real world example. I wrote tests for a Java port of a small C library. The phonet library returns the same string for words with the same pronunciation in German. In this way it’s similar to the Soundex and Metaphone algorithms in English.

The test, I’m going to present, found bugs in the Java port. More surprisingly it also found bugs in the C library (that’s more than 10 years old) and the JNI-bridge code we used for years.

If you have a look at the C code you’ll find out that it is not trivial. Well, that’s the most polite way to say that it’s devilish hard to read. It’s beyond my means to read a code that initializes 10 static variables and has 26 automatic variables in the main function. It’s not obvious how you can port this code to Java. Luckily, there was already a Java port that is more or less analogous to the C code - it’s not very idiomatic Java.

My goal was to fix the port. I ignored performance and maintainability issues.

How who you figure out that the quite complex function behaves the same way in the original and the ported version?

The trivial test approach to explicit check values is costly and it’s hard to get good code coverage.

assertEquals("a", phonet("a"));

This way you need a lot of test code before you get some coverage. Line and branch coverage might be easy to attain, but the phonet library has 200+ rules that are evaluated by precedence. Now are we going to write x * 200 tests to check that the right values are returned? If we check all rules there are still all characters left that have to be processed consistently. When we are done with that phonet rules could change and we have to redo some of them. Not very exciting.

The trivial test approach is simply too expensive (and boring). Let’s try another approach that reduces our work to one line of code: write the test as an analogous function test.

The analogous test ensures that f(x) = g(x) for all x. f(x) is the C version and g(x) is the Java version. The only thing left is to generate x and see if f(x) == g(x) holds.

That’s the single line of code we have to write to get the test coverage. Okay, we have to write one line to check that f(x) = g(x) and some code to generate x. It all boils down to comparing the output of the C and the Java versions.

assertEquals(in, cPhonet.phonet(in), jphonet.code(in));

Unfortunately, we cannot create all values with brute force. The challenge is to find ways to generate the right input values and the quality of the test depends on the generated input values. I wrote the same test with multiple approaches each and every one useful for a facet of input values:
  • There’s a Quickcheck-based random generator test for large input
  • There’s a test with German dictionaries of known values
  • There’s a exhaustive test that generates all possible values up to a certain (small) size
As an example I’ll show you the exhaustive test for small strings. It uses Permutation.kPermutationWithRepetition(all, STRING_ADD, size) to create an Iterable that returns all permutations of size k from a set of input values with repetition of values (i.e. for the input AB and the size k = 2 it generates AA, AB, BA, BB).
@Test public void phonetDefinedCharacters() {
    Coder jphonet = createCoder();
    CPhonet cPhonet = new CPhonet();

    for (int size = 1; size <= 2; size++) {
        for (String in : kPermutationWithRepetition(
                CharEncoding.allChars(), STRING_ADD, size)) {
            assertEquals(in, cPhonet.phonet(in, rules()), jphonet.code(in));

You can have a look at the full implementation in the CPhonetVsPhonetikTest class.

As you can see from the test code the idea is simple. Following this approach will not guarantee 100% test coverage as long as you cannot generate all values. The hard part is to come up with the right test data. A pure Quickcheck-based approach is a good start, but might need to run too long to find critical bugs and additional tests are necessary.

Porting is only one class of tests to use the analogous function testing for. Another class are tests where you already have a simple implementation. You are able to verify the correctness of the simple code easily. The simple implementation is then used to test the harder implementation (with preferable non-functional characteristics).  It might be a good approach if you do not trust a piece of code, but you’re unable to read all of the source code. Just check it against a simple implementation.

A blind spot of an analogous function test is that bugs in f(x) and g(x) can mask each other. If you implement the same bug in both versions the test will not indicate the problem.

The moral from this story is that black-box tests with generated values can be a valuable tool besides classical tests and code reviews - but as everything in life it’s not perfect. It’s only one tool. Quality is still expensive.


Testing: Start with the result

If you write tests based on generated values, it’s incredible how often it is easier to start from the result value and work your way backwards to the input value (that results in return value).

One part of the story is that you should not repeat the algorithm in the test. The only thing you’re testing is that you can write the same code in two places. Every error in your reasoning is repeated. It’s natural to start with a known return value and create input values that lead to this result.

I’ll demonstrate the principle with a functionality I recently used: k-permutation with repetition. (k defines the size of the output values. Permutation indicates that we care about order. Repetition says that output values can be created by reusing input values.) For example the k-permutation with repetition for the input set {A B} and k = 3 is {AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB}. We know that the size of the permutation is n^k. (n is the size of the input set.)

@Test public void kPermutationWithRepetitions() {
    int maxSize = 6; 
    for(String expected : toIterable(strings(1, maxSize))) {
        Set<Character> allowed =
        Set<Character> additional =
            anySet(characters(), 0, maxSize - allowed.size());
        int k = expected.length();
        Iterable<String> actual =
            kPermutationWithRepetition(allowed, STRING_ADD, k);
            Math.pow(allowed.size(), k), Iterables.size(actual), 0);
        assertTrue(Iterables.contains(actual, expected));

The test uses Quickcheck and Guava. It starts by generating one expected result value. The interesting input value is the set of allowed characters. This set contains all characters of the result string plus some additional characters. We have to take care that the input set is small enough, otherwise the result size is too large to run the test in practice. With this in place we can call kPermutationWithRepetition that generates the actual result. The test checks that the expected value is in the result and that the size of the permutation is correct.

This example demonstrates that it can be much easier to start with an expected return value. Depending on the situation it’s appropriate to create the entire result or just aspects of it. If you can generate at least one result value, the probability of the generated values is evenly distributed over all results and you know the total result size, your test is sufficient.


My little stupid Raspberry Pi project: 1 LED and 1 Switch

I recently bought a Raspberry Pi. My goal is to build a living room mp3 player. As I’m a software guy the programming part isn’t a problem, but the mp3 player has to be able to respond to some buttons having a semantic like: “Hey, next song please!”. So I have to do some simple hardware. Sadly, my knowledge about electronics has faded. This is the journey of electronics ignorant in hardware land.

As the a first step, I put two examples from the raspbian user guide together. Pressing a switch toggles a LED, enabling and disabling it. Right now I’ve no idea how to calculate the pull-up-resistor for the switch and the resistor for the LED. I was happy to see that it simply worked.

Parts: Breadboard, 10k Ohm, 150 Ohm, Switch, 1 Led, Jumper Wires

What I learnt so far

There are four and five band resistor color codings. One half of the resistors I use are four band coded and the other half five band coded. Man, this is like software world with downward compatibility. Can’t they just use five bands?

The world outside of a computer has x,y,z axis. There is a difference between nice, short and too f***ing short cables. Because they had only male jumper wires at my store, things are a bit improvised anyway. I had to fit an end connector pin on the male wire. The guy at the shop said I should solder it on. Yeah right, the first thing I’ll do is soldering; adding some shrink tubing is also a good idea. The x,y,z problem also applies to switches. They actually have to fit into the Breadboard.

Software-wise the project is very simple. I use the raspberry pi GPIO python library. I read the state of the switch from port 12 and enable the LED on the first and disable the LED on the second state change.

import RPi.GPIO as GPIO
import time
GPIO.setup(11, GPIO.OUT)
GPIO.setup(12, GPIO.IN)
state = False

def wait():
while True:
 wait() #don't burn the CPU
 if not GPIO.input(12):
    print state
    state = not state
    GPIO.output(11, state)
    while not GPIO.input(12):

Not very surprisingly it’s a good idea to not poll the input port constantly. Otherwise the python process would use 100% of the available CPU time. I don’t know yet how to replace the polling with GPIO interrupts.

I think I won the first round of the hardware game:
a) It works.
b) My Raspberry is still alive.
c) It’s quite clear I have to learn and practice a lot.


How to restore the subversion history of a file

Every now and then the history of a file in a subversion repository is lost. I’ll describe how you can restore the history.

If you do a svn copy the file history of the new location shares the history of the old location. This is way subversion supports svn move and branches. Restoring a file’s history is only a matter of copying the right file. Given the file $A_FILE to restore, the new file location $B_FILE and the last revision $A_REV of  $A_FILE (before $A_FILE was removed) the operations to restored the history of $A_FILE are:

svn delete $B_FILE
svn cp $URL/$A_FILE@$A_REV $B_FILE
svn cat $URL/$B_FILE > $B_FILE

After you restored $A_FILE’s history all changes directly to $B_FILE are lost. Only the content of the file $B_FILE survives.

Here’s a commented script demonstrates the problem and show how to fix it:
#!/bin/bash -ex

#do everything in a working directory that's removed after the script ran
mkdir -p work
WORK=$(readlink -f work)
cd "$WORK"

#setup a repository
REPOSITORY=$(readlink -f repo)
svnadmin create "$REPOSITORY"

#checkout the local workspace
svn co $URL workspace
cd workspace


#create the file to restore
echo $A_FILE content > $A_FILE
svn add $A_FILE
svn commit -m"initial version of $A_FILE"
svn update

#history of the file is lost
svn delete $A_FILE
svn commit -m"deleted $A_FILE"

#the new file
echo $B_FILE content > $B_FILE
svn add $B_FILE
svn commit -m"initial version of $B_FILE"
svn update

#restore the file history and keep the file content
svn delete $B_FILE
svn cp $URL/$A_FILE@$A_REV $B_FILE
svn cat $URL/$B_FILE > $B_FILE
svn status
svn commit -m"history from $A_FILE in new path $B_FILE"

#check the result
svn update && svn info
svn log --diff $URL/$B_FILE
svn cat $URL/$B_FILE
svn diff $URL/$B_FILE $URL/$B_FILE@$B_REV


My best Python HTTP test server so far

I’ve implemented a bunch of test HTTP servers in Python and now I think I’ve got the implementation right:
  • Test client code doesn’t have to care about the specific HTTP port. Any free port can be used by the test without interference with other running processes.
  • The HTTP server is up and running when the test is ready.
  • Resource handling uses the the with statement. The HTTP server is shut-down at the end of the test.
  • The concrete request urls (host and port) are transparent for the test.
  • Test can be run fully in memory. The only resource allocated is the HTTP socket.
The actual test is brief. We call the http_server with a HTTP handler and the function url is returned. The test can use the url function to create the request url. This is handy as the allocated port of the HTTP server is not fixed. In the test below we check that the returned content matches.
with http_server(Handler) as url:
    assert list(urllib2.urlopen(url("/resource"))) == [content]
To run this test we need a HTTP handler implementation. You could use the SimpleHTTPServer.SimpleHTTPRequestHandler that comes with python and work with files served from a directory. This is any good point to start but setting up a test folder with the necessary content is cumbersome and inflexible.

This handler runs in memory without any additional setup. It will always returns with the 200 response code writes the content into the request.
code, content = 200, "Ok"
class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
  def do_GET(self):
      self.wfile.write("\n" + content )
The http_server implementation starts a thread, opens a socket and yields the url function. The HTTP request handler runs in the spawned thread.
def http_server(handler):
  def url(port, path):
      return 'http://%s:%s%s' % (socket.gethostname(), port, path)
  httpd = SocketServer.TCPServer(("", 0), handler)
  t = threading.Thread(target=httpd.serve_forever)
  port = httpd.server_address[1]
  yield functools.partial(url, port)
I leave it as an exercise to you to write an implementation that reuses the HTTP server in multiple tests. This could be necessary if the overhead of allocating ports dominates the test running time.


Convert videos to wav and mp3 audio files

On the one hand, I listen almost daily to a fluctuating list of podcasts - mostly while commuting and jogging. I find it very relaxing. On the other hand, I have an ever increasing list of presentation videos I would like to watch, but I don’t actually do it. For some talks and most interviews the video is nice but not necessary. The audio is sufficient to get the information and it’s handy to be able to listen to interesting stuff on the run.

The following bash script downloads the video from youtube and encodes the audio into an low quality mp3 file.

#!/bin/bash -e
cd $(mktemp -d youtubedownXXXX)
youtube-dl -f worst -t $1
name=$(echo *)
ffmpeg -i $name -vn -acodec pcm_s16le -ar 44100 -ac 2 $name.wav
lame -m m -q 0 --vbr-new -B 64 $name.wav $name.mp3

The script depends on youtube-dl, ffmpeg and lame. It creates a temporary folder where it puts the downloaded video, wav and mp3 files.


Archiving twitter messages without shortened URLs with Python Twitter Tools

Version 1.9  of the Python Twitter Tools  include the new feature to follow redirects of tweeted urls. The developer behind twitter tools Mike Verdone accepted my patch.

The goal is to archive readable URLs in the tweet archive by replacing all URLs from shortening services. You can run it with twitter-archiver -f ThomasAJung to follow all links or twitter-archiver -r bit.ly,t.co,goo.gl ThomasAJung to follow the links of selected hosts.

The archived output changes from:
228771182638419968 2012-07-27 10:37:54 CEST Believe it or not there are people who see interop as a bad thing. It interferes with their business model, ... http://t.co/AhAg41x9

228771182638419968 2012-07-27 10:37:54 CEST Believe it or not there are people who see interop as a bad thing. It interferes with their business model, ... http://scripting.com/stories/2012/07/26/oauth1IsFine.html

You can read the Shortcomings section of the Wikipedia article about URL shortening if you are interested why you should replace the shortened URLs.


Using Bash Shell Parameter Expansions

Knowing shell parameter extensions is quite handy to manipulate parameters in Bash. You can remove and replace characters, select substrings, check if a variable is present, find variables and convert strings to upper and lower case.

A simple example is to emulate the behavior of the coreutils commands dirname and basename. After setting a variable FILE=/tmp/a.txt the output of dirname $FILE and echo ${FILE%/*} is /tmp. We can also create a parameter expansion that emulates basename $FILE that returns a.txt: echo ${FILE##*/}.

As you can see from these two examples parameter expansions are powerful. It’s also obvious that the parameter expansions are quite cryptic. They do not use a syntax that is used elsewhere and are hard to remember. The advantage of parameter expansions is that you can use them directly for ad hoc scripting in the shell.

The remainder of this blog demonstrates all parameter expansions with examples. The Bash manual contains a complete description of parameter expansions.


Lines starting with $ contain the command line input and lines starting with > contain the output of commands. Each section contains a shell session - all variables have the scope of this section.

Get the value of a variable
$ A=value
$ echo $A
> value
$ echo ${A}
> value

Assign variable if null
$ echo $F
$ echo ${F:=A}
> A
$ echo ${F:=B}
> A
$ echo $F
> A
Substitute variable if null
$ echo $F
$ echo ${F:-A}
> A
$ echo $F
$ G=B
$ echo ${G:-A}
> B
Exit shell if variable is null
$ bash -c "echo ${A:?died}"
> bash: A: died
$ echo $?  #return code
> 1
$ A=ok
$ bash -c "echo ${A:?died}"
> ok
$ echo $?  #return code
> 0

Substitute variable if not null
$ echo ${F:+A}
$ F=G
$ echo ${F:+A}
> A

Substrings from index
$ A=abcdefg
$ echo ${A:1}
> bcdefg
$ echo ${A:1:3} #with optional length parameter
> bcd

Length of a string
$ A=abc
$ echo ${#A}
> 3
Delete from left
$ A=abcdefabc

$ echo ${A#a}
> bcdefabc

$ echo ${A#ab}
> cdefabc
$ echo ${A#b} #no match
> abcdefabc
$ echo ${A#?b}
> cdefabc
$ echo ${A#?c} #no match
> abcdefabc
$ echo ${A#*c}
> defabc

$ echo ${A#*b} #shorted match
> cdefabc
$ echo ${A##*b} #longest match
> c
Delete from right
$ A=abcdefabc

$ echo ${A%c}
> abcdefab

$ echo ${A%b*} #shortest match
> abcdefa
$ echo ${A%%b*} #longest match
> a

Replace variable value

$ echo ${A/a/x}
> xbcdefabc
$ echo ${A/b/x}
> axcdefabc
$ echo ${A/#a/x} #from left
> xbcdefabc
$ echo ${A/#b/x}
> abcdefabc
$ echo ${A/%c/x} #from right
> abcdefabx
$ echo ${A/%b/x}
> abcdefabc
$ echo ${A/b*b/x}
> axc
$ echo ${A//b/x} #replace all
> axcdefaxc
$ echo ${A//b}
> acdefac

Case conversion
$ A=abc
$ echo ${A^} #upper case first
> Abc
$ echo ${A^^} #upper case all
$ echo ${B,} #lower case first
> dEF
$ echo ${B,,} #lower case all
> def

Find variable name with prefix
$ echo ${!US*}

Get the value of a parameter defined by the value of another parameter
$ A=B
$ B=b_value
$ echo ${!A} #Indirect expansion
> b_value

You should always keep in mind that the shell parameter expansions can be used with nested parameter expansion, tilde expansions, command substitutions and arithmetic expansions.

Used together with parameter expansions
$ A=abc
$ B=a
$ echo ${A#$B}
> bc
$ echo ${A#$B?}
> c

Used together with tilde expansion
$ A=./a
$ echo ${A/#./~}
> /home/user/a
Used together with command substitution
$ A=./a
$ cd /usr
$ echo ${A/#./$(pwd)}
> /usr/a

Used together with arithmetic expansion
$ echo ${A/#./$((1 + 10))}
> 11/a

After getting so far in the examples I suggest you have also a look at the other shell expansions supported by Bash.