20130527

Speed-up your project build without the tmpfs hassle

Running all tests in my current project takes some time. With 26 GB of free memory, why not use it for something useful? tmpfs is one way to speed up the test execution by keeping a complete file system in memory.

The problem with tmpfs is that it's only kept in memory. You have to setup the scripts yourself to flush content back to disk. These scripts should better work perfect, otherwise you'll loose parts of your work.

A common approach is to work directly in a tmpfs folder and backup your work to a folder on disk. When your machine is booting you restore the tmpfs folder from this backup folder. Once booted cron is used to synch the tmpfs folder and disk folder.

I found this setup a bit complicated and error prone. I never really trusted my own setup on boot time and with cron. Now I use a much simpler setup that is not using cron at all.

The performance on my machine running a single test, using the IDE and deploying in a web server was always reasonable. Just running all tests takes to much time.

The sweet spot I found is to setup a workspace on disk, sync to tmpfs under /dev/shm and run all tests there. This keeps my setup more or less unchanged and removes the possibility to loose work just because I'm too dump to setup things correctly.

The resulting performance increase is reasonable:

$ nosetests && run_tests.py
........................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 248 tests in 107.070s

OK
........................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 248 tests in 19.423s

OK

It's now five times faster than before.

With python the setup the setup is quite simple:

#!/bin/bash -e

WORK=src/py
LOG=$(pwd)/test.log
TARGET=$(hg root)
SHADOW=/dev/shm/shadow/$TARGET

date > $LOG
mkdir -p $SHADOW

cd $SHADOW
rsync --update --delete --exclude=".*" --exclude=ENV --archive $TARGET ./..

if [ ! -d ENV ]
then
    virtualenv ENV
fi
. ENV/bin/activate

cd $WORK
python setup.py develop >> $LOG
nosetests $* | tee -a $LOG
exit ${PIPESTATUS[0]}

I'm just resyncing into a /dev/shm folder, setup the test environment there (virtualenv and python setup.py) and run the tests (nosetests).

It's is still possible to run single tests from the command line on the tmpfs folder. It's also possible to kick this off from your IDE but you'll loose your test runner and debugging capabilities. As I sad earlier I don't need these right now.

I hope my little twist with tmpfs helps with setting up a faster development environment without all the scripting hassle.

20130501

How to use the Mercurial Rebase extension to collapse and move change sets

Every now and then I like to combine change sets to one change set. You can do this with the rebase extension. I will show you how rebase works with some examples.

The other mayor use case for the rebase extension is to keep the history of your mercurial linear when working with a team of developers. The extension gets its name from this use case: changing the parent change set of your changes to another change set (often the tip). As long as every developer uses pull, rebase, push the history is free of merge change sets.

I setup a repository with three change sets.

hg init repo
cd repo

echo a > a
hg commit -A -m"add a"
ID_A=$(hg id -n)

echo b > b
hg commit -A -m"add b"
ID_B=$(hg id -n)

echo c > c
hg commit -A -m"add c"
ID_C=$(hg id -n)

hg version | head -n 1 ; echo
hg glog --patch

Mercurial Distributed SCM (version 2.5.4)

@  changeset:   2:aa77f078beed
|  tag:         tip
|  user:        blob79
|  date:        Wed May 01 12:14:27 2013 +0200
|  summary:     add c
|
|  diff -r a692ab61aad8 -r aa77f078beed c
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/c      Wed May 01 12:14:27 2013 +0200
|  @@ -0,0 +1,1 @@
|  +c
|
o  changeset:   1:a692ab61aad8
|  user:        blob79
|  date:        Wed May 01 12:14:27 2013 +0200
|  summary:     add b
|
|  diff -r e31426a230b0 -r a692ab61aad8 b
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/b      Wed May 01 12:14:27 2013 +0200
|  @@ -0,0 +1,1 @@
|  +b
|
o  changeset:   0:e31426a230b0
   user:        blob79
   date:        Wed May 01 12:14:27 2013 +0200
   summary:     add a

   diff -r 000000000000 -r e31426a230b0 a
   --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
   +++ b/a      Wed May 01 12:14:27 2013 +0200
   @@ -0,0 +1,1 @@
   +a

After the collapsing the change sets $ID_B and $ID_C a new change set is in the history containing changes of both. This change set can then be pulled or imported as one unit making the history of the repository a bit easier to read.

hg rebase --collapse --source $ID_B --dest $ID_A -m"collapse change sets"
hg glog --patch

saved backup bundle to /home/thomas/Desktop/rebaseblob/repo/.hg/strip-backup/a692ab61aad8-backup.hg
@  changeset:   1:bb2e0cde315f
|  tag:         tip
|  user:        blob79
|  date:        Wed May 01 12:14:28 2013 +0200
|  summary:     collapse change sets
|
|  diff -r e31426a230b0 -r bb2e0cde315f b
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/b      Wed May 01 12:14:28 2013 +0200
|  @@ -0,0 +1,1 @@
|  +b
|  diff -r e31426a230b0 -r bb2e0cde315f c
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/c      Wed May 01 12:14:28 2013 +0200
|  @@ -0,0 +1,1 @@
|  +c
|
o  changeset:   0:e31426a230b0
   user:        blob79
   date:        Wed May 01 12:14:27 2013 +0200
   summary:     add a

   diff -r 000000000000 -r e31426a230b0 a
   --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
   +++ b/a      Wed May 01 12:14:27 2013 +0200
   @@ -0,0 +1,1 @@
   +a

If you have to keep the change sets because they were already published, you can also keep the old change sets adding the new collapsed change set as a separate branch.

hg rebase --keep --collapse --source $ID_B --dest $ID_A -m"collapse change sets"
hg glog

@  changeset:   3:0fa31b1ebcf1
|  tag:         tip
|  parent:      0:448200734313
|  user:        blob79
|  date:        Wed May 01 12:14:28 2013 +0200
|  summary:     collapse change sets
|
| o  changeset:   2:0a6400bebe21
| |  user:        blob79
| |  date:        Wed May 01 12:14:28 2013 +0200
| |  summary:     add c
| |
| o  changeset:   1:a93228700dc1
|/   user:        blob79
|    date:        Wed May 01 12:14:28 2013 +0200
|    summary:     add b
|
o  changeset:   0:448200734313
   user:        blob79
   date:        Wed May 01 12:14:28 2013 +0200
   summary:     add a


hg log -r "::tip" --patch

changeset:   0:448200734313
user:        blob79
date:        Wed May 01 12:14:28 2013 +0200
summary:     add a

diff -r 000000000000 -r 448200734313 a
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/a Wed May 01 12:14:28 2013 +0200
@@ -0,0 +1,1 @@
+a

changeset:   3:0fa31b1ebcf1
tag:         tip
parent:      0:448200734313
user:        blob79
date:        Wed May 01 12:14:28 2013 +0200
summary:     collapse change sets

diff -r 448200734313 -r 0fa31b1ebcf1 b
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/b Wed May 01 12:14:28 2013 +0200
@@ -0,0 +1,1 @@
+b
diff -r 448200734313 -r 0fa31b1ebcf1 c
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/c Wed May 01 12:14:28 2013 +0200
@@ -0,0 +1,1 @@
+c

As I already mentioned you can use the rebase extension to move change sets around. A small exercise is to change the order of two change sets. With the same setup of three change sets as in the collapse example before, we move the changes from the initial change set $ID_C before the changes made in change set $ID_B.

hg rebase --source $ID_C --dest $ID_A
hg rebase --source $ID_B --dest tip

@  changeset:   2:673889b80c51
|  tag:         tip
|  user:        blob79
|  date:        Wed May 01 12:14:29 2013 +0200
|  files:       b
|  description:
|  add b
|
|
o  changeset:   1:175e0515be8b
|  user:        blob79
|  date:        Wed May 01 12:14:29 2013 +0200
|  files:       c
|  description:
|  add c
|
|
o  changeset:   0:16ef477c54cb
   user:        blob79
   date:        Wed May 01 12:14:29 2013 +0200
   files:       a
   description:
   add a


A nice exercise for the reader is to extend the example to fourth change set. This change set is the tip of the repository as a child of change set $ID_C. After the change set move it should be the child of the change set $ID_B.

In real world repositories changes are not fully independent, so while rebasing you have to resolve conflicts, but this another blog post for another day.

20130423

Mini-Quickcheck for Python

I wanted an implementation of a mini-Quickcheck in Python. This is the API I came up with. It is also a good way to see what’s at the heart of Quicheck: generators.

I cut every corner I could. Some methods are not random, but this can be easily fixed.

There is a runner decorator (dependent on the decorator library) than run’s test methods repeatedly.

import random

def oneof(*values):
    return random.choice(values)

def optional(param):
    return oneof(None, param)

def boolean():
    return oneof(True, False)

def integer(min=0,max=1<<30):
    return random.randint(min,max)

def char():
    return chr(integer(min=2,max=ord('z')))

def string(min=0):
    return "".join([char() for _ in xrange(integer(min=min, max=10))])

def nonempty_string():
    return string(min=1)

def substring(string):
    if not string:
           return string
    start = integer(0, len(string) - 1)
    end = start + integer(len(string) - start)
    return string[start:end]

def date():
    return datetime.date.today()

def subset(*vs):
    return [e for e in vs if boolean()]

def list(gen):
    return [gen() for _ in xrange(integer(max=5))]
    
@decorator
def runner(func, *args, **kwargs):
    for r in xrange(4):
        test_instance = args[0]

About

My blog did not have an about page for years. Now that blogs get out of fashion I can get one as well, as the chances that someone actually reads it diminishes.

Having an about page is nice for everybody involved. There is a nice personal touch to the site full of otherwise dry articles. This is also the space where a blogger can do some navel-gazing and bragging: a nice photo from a place I was you were not, I can tell you what a great company I work for and what an amazingly smart guy I am.

The problem is: I like to stay at home most of the time, I work for a normal company and I'm definitely not smart. (My best trait is productive laziness.)

If you read this far the one thing you should consider is becoming an organ donor. You won't mind. You may help somebody in a lot of trouble. At least I tried! Maybe you like to read a bit about organ donation...


(The fact that I wrote this after reading an about page is a mere coincidence. You should not ask somebody who they are. It's probably the worst question possible.)

20121219

Analogous function testing

For a long time I wanted to write about a testing technique I call analogous function testing. Some time ago I already wrote about the inverse function test pattern and the analogous function testing pattern is mentioned in my piece about functional hash functions.

Instead of describing some synthetic examples I can now present the pattern with a real world example. I wrote tests for a Java port of a small C library. The phonet library returns the same string for words with the same pronunciation in German. In this way it’s similar to the Soundex and Metaphone algorithms in English.

The test, I’m going to present, found bugs in the Java port. More surprisingly it also found bugs in the C library (that’s more than 10 years old) and the JNI-bridge code we used for years.

If you have a look at the C code you’ll find out that it is not trivial. Well, that’s the most polite way to say that it’s devilish hard to read. It’s beyond my means to read a code that initializes 10 static variables and has 26 automatic variables in the main function. It’s not obvious how you can port this code to Java. Luckily, there was already a Java port that is more or less analogous to the C code - it’s not very idiomatic Java.

My goal was to fix the port. I ignored performance and maintainability issues.

How who you figure out that the quite complex function behaves the same way in the original and the ported version?

The trivial test approach to explicit check values is costly and it’s hard to get good code coverage.

assertEquals("a", phonet("a"));

This way you need a lot of test code before you get some coverage. Line and branch coverage might be easy to attain, but the phonet library has 200+ rules that are evaluated by precedence. Now are we going to write x * 200 tests to check that the right values are returned? If we check all rules there are still all characters left that have to be processed consistently. When we are done with that phonet rules could change and we have to redo some of them. Not very exciting.

The trivial test approach is simply too expensive (and boring). Let’s try another approach that reduces our work to one line of code: write the test as an analogous function test.

The analogous test ensures that f(x) = g(x) for all x. f(x) is the C version and g(x) is the Java version. The only thing left is to generate x and see if f(x) == g(x) holds.

That’s the single line of code we have to write to get the test coverage. Okay, we have to write one line to check that f(x) = g(x) and some code to generate x. It all boils down to comparing the output of the C and the Java versions.

assertEquals(in, cPhonet.phonet(in), jphonet.code(in));

Unfortunately, we cannot create all values with brute force. The challenge is to find ways to generate the right input values and the quality of the test depends on the generated input values. I wrote the same test with multiple approaches each and every one useful for a facet of input values:
  • There’s a Quickcheck-based random generator test for large input
  • There’s a test with German dictionaries of known values
  • There’s a exhaustive test that generates all possible values up to a certain (small) size
As an example I’ll show you the exhaustive test for small strings. It uses Permutation.kPermutationWithRepetition(all, STRING_ADD, size) to create an Iterable that returns all permutations of size k from a set of input values with repetition of values (i.e. for the input AB and the size k = 2 it generates AA, AB, BA, BB).
@Test public void phonetDefinedCharacters() {
    Coder jphonet = createCoder();
    CPhonet cPhonet = new CPhonet();

    for (int size = 1; size <= 2; size++) {
        for (String in : kPermutationWithRepetition(
                CharEncoding.allChars(), STRING_ADD, size)) {
            assertEquals(in, cPhonet.phonet(in, rules()), jphonet.code(in));
        }
    }
}

You can have a look at the full implementation in the CPhonetVsPhonetikTest class.

As you can see from the test code the idea is simple. Following this approach will not guarantee 100% test coverage as long as you cannot generate all values. The hard part is to come up with the right test data. A pure Quickcheck-based approach is a good start, but might need to run too long to find critical bugs and additional tests are necessary.

Porting is only one class of tests to use the analogous function testing for. Another class are tests where you already have a simple implementation. You are able to verify the correctness of the simple code easily. The simple implementation is then used to test the harder implementation (with preferable non-functional characteristics).  It might be a good approach if you do not trust a piece of code, but you’re unable to read all of the source code. Just check it against a simple implementation.

A blind spot of an analogous function test is that bugs in f(x) and g(x) can mask each other. If you implement the same bug in both versions the test will not indicate the problem.

The moral from this story is that black-box tests with generated values can be a valuable tool besides classical tests and code reviews - but as everything in life it’s not perfect. It’s only one tool. Quality is still expensive.

20121212

Testing: Start with the result

If you write tests based on generated values, it’s incredible how often it is easier to start from the result value and work your way backwards to the input value (that results in return value).

One part of the story is that you should not repeat the algorithm in the test. The only thing you’re testing is that you can write the same code in two places. Every error in your reasoning is repeated. It’s natural to start with a known return value and create input values that lead to this result.

I’ll demonstrate the principle with a functionality I recently used: k-permutation with repetition. (k defines the size of the output values. Permutation indicates that we care about order. Repetition says that output values can be created by reusing input values.) For example the k-permutation with repetition for the input set {A B} and k = 3 is {AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB}. We know that the size of the permutation is n^k. (n is the size of the input set.)

@Test public void kPermutationWithRepetitions() {
    int maxSize = 6; 
    
    for(String expected : toIterable(strings(1, maxSize))) {
        Set<Character> allowed =
            newHashSet(Chars.asList(expected.toCharArray()));
        Set<Character> additional =
            anySet(characters(), 0, maxSize - allowed.size());
        allowed.addAll(additional);
    
    
        int k = expected.length();
        Iterable<String> actual =
            kPermutationWithRepetition(allowed, STRING_ADD, k);
        
        assertEquals(
            Math.pow(allowed.size(), k), Iterables.size(actual), 0);
        assertTrue(Iterables.contains(actual, expected));
    }
}

The test uses Quickcheck and Guava. It starts by generating one expected result value. The interesting input value is the set of allowed characters. This set contains all characters of the result string plus some additional characters. We have to take care that the input set is small enough, otherwise the result size is too large to run the test in practice. With this in place we can call kPermutationWithRepetition that generates the actual result. The test checks that the expected value is in the result and that the size of the permutation is correct.

This example demonstrates that it can be much easier to start with an expected return value. Depending on the situation it’s appropriate to create the entire result or just aspects of it. If you can generate at least one result value, the probability of the generated values is evenly distributed over all results and you know the total result size, your test is sufficient.

20121128

My little stupid Raspberry Pi project: 1 LED and 1 Switch


I recently bought a Raspberry Pi. My goal is to build a living room mp3 player. As I’m a software guy the programming part isn’t a problem, but the mp3 player has to be able to respond to some buttons having a semantic like: “Hey, next song please!”. So I have to do some simple hardware. Sadly, my knowledge about electronics has faded. This is the journey of electronics ignorant in hardware land.

As the a first step, I put two examples from the raspbian user guide together. Pressing a switch toggles a LED, enabling and disabling it. Right now I’ve no idea how to calculate the pull-up-resistor for the switch and the resistor for the LED. I was happy to see that it simply worked.



Parts: Breadboard, 10k Ohm, 150 Ohm, Switch, 1 Led, Jumper Wires

What I learnt so far

There are four and five band resistor color codings. One half of the resistors I use are four band coded and the other half five band coded. Man, this is like software world with downward compatibility. Can’t they just use five bands?

The world outside of a computer has x,y,z axis. There is a difference between nice, short and too f***ing short cables. Because they had only male jumper wires at my store, things are a bit improvised anyway. I had to fit an end connector pin on the male wire. The guy at the shop said I should solder it on. Yeah right, the first thing I’ll do is soldering; adding some shrink tubing is also a good idea. The x,y,z problem also applies to switches. They actually have to fit into the Breadboard.

Software-wise the project is very simple. I use the raspberry pi GPIO python library. I read the state of the switch from port 12 and enable the LED on the first and disable the LED on the second state change.

import RPi.GPIO as GPIO
import time
GPIO.setmode(GPIO.BOARD)
GPIO.setup(11, GPIO.OUT)
GPIO.setup(12, GPIO.IN)
state = False

def wait():
 time.sleep(0.1)
while True:
 wait() #don't burn the CPU
 if not GPIO.input(12):
    print state
    state = not state
    GPIO.output(11, state)
    while not GPIO.input(12):
            wait()
            continue

Not very surprisingly it’s a good idea to not poll the input port constantly. Otherwise the python process would use 100% of the available CPU time. I don’t know yet how to replace the polling with GPIO interrupts.


I think I won the first round of the hardware game:
a) It works.
b) My Raspberry is still alive.
c) It’s quite clear I have to learn and practice a lot.