20130527

Speed-up your project build without the tmpfs hassle

Running all tests in my current project takes some time. With 26 GB of free memory, why not use it for something useful? tmpfs is one way to speed up the test execution by keeping a complete file system in memory.

The problem with tmpfs is that it's only kept in memory. You have to setup the scripts yourself to flush content back to disk. These scripts should better work perfect, otherwise you'll loose parts of your work.

A common approach is to work directly in a tmpfs folder and backup your work to a folder on disk. When your machine is booting you restore the tmpfs folder from this backup folder. Once booted cron is used to synch the tmpfs folder and disk folder.

I found this setup a bit complicated and error prone. I never really trusted my own setup on boot time and with cron. Now I use a much simpler setup that is not using cron at all.

The performance on my machine running a single test, using the IDE and deploying in a web server was always reasonable. Just running all tests takes to much time.

The sweet spot I found is to setup a workspace on disk, sync to tmpfs under /dev/shm and run all tests there. This keeps my setup more or less unchanged and removes the possibility to loose work just because I'm too dump to setup things correctly.

The resulting performance increase is reasonable:

$ nosetests && run_tests.py
........................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 248 tests in 107.070s

OK
........................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 248 tests in 19.423s

OK

It's now five times faster than before.

With python the setup the setup is quite simple:

#!/bin/bash -e

WORK=src/py
LOG=$(pwd)/test.log
TARGET=$(hg root)
SHADOW=/dev/shm/shadow/$TARGET

date > $LOG
mkdir -p $SHADOW

cd $SHADOW
rsync --update --delete --exclude=".*" --exclude=ENV --archive $TARGET ./..

if [ ! -d ENV ]
then
    virtualenv ENV
fi
. ENV/bin/activate

cd $WORK
python setup.py develop >> $LOG
nosetests $* | tee -a $LOG
exit ${PIPESTATUS[0]}

I'm just resyncing into a /dev/shm folder, setup the test environment there (virtualenv and python setup.py) and run the tests (nosetests).

It's is still possible to run single tests from the command line on the tmpfs folder. It's also possible to kick this off from your IDE but you'll loose your test runner and debugging capabilities. As I sad earlier I don't need these right now.

I hope my little twist with tmpfs helps with setting up a faster development environment without all the scripting hassle.

20130501

How to use the Mercurial Rebase extension to collapse and move change sets

Every now and then I like to combine change sets to one change set. You can do this with the rebase extension. I will show you how rebase works with some examples.

The other mayor use case for the rebase extension is to keep the history of your mercurial linear when working with a team of developers. The extension gets its name from this use case: changing the parent change set of your changes to another change set (often the tip). As long as every developer uses pull, rebase, push the history is free of merge change sets.

I setup a repository with three change sets.

hg init repo
cd repo

echo a > a
hg commit -A -m"add a"
ID_A=$(hg id -n)

echo b > b
hg commit -A -m"add b"
ID_B=$(hg id -n)

echo c > c
hg commit -A -m"add c"
ID_C=$(hg id -n)

hg version | head -n 1 ; echo
hg glog --patch

Mercurial Distributed SCM (version 2.5.4)

@  changeset:   2:aa77f078beed
|  tag:         tip
|  user:        blob79
|  date:        Wed May 01 12:14:27 2013 +0200
|  summary:     add c
|
|  diff -r a692ab61aad8 -r aa77f078beed c
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/c      Wed May 01 12:14:27 2013 +0200
|  @@ -0,0 +1,1 @@
|  +c
|
o  changeset:   1:a692ab61aad8
|  user:        blob79
|  date:        Wed May 01 12:14:27 2013 +0200
|  summary:     add b
|
|  diff -r e31426a230b0 -r a692ab61aad8 b
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/b      Wed May 01 12:14:27 2013 +0200
|  @@ -0,0 +1,1 @@
|  +b
|
o  changeset:   0:e31426a230b0
   user:        blob79
   date:        Wed May 01 12:14:27 2013 +0200
   summary:     add a

   diff -r 000000000000 -r e31426a230b0 a
   --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
   +++ b/a      Wed May 01 12:14:27 2013 +0200
   @@ -0,0 +1,1 @@
   +a

After the collapsing the change sets $ID_B and $ID_C a new change set is in the history containing changes of both. This change set can then be pulled or imported as one unit making the history of the repository a bit easier to read.

hg rebase --collapse --source $ID_B --dest $ID_A -m"collapse change sets"
hg glog --patch

saved backup bundle to /home/thomas/Desktop/rebaseblob/repo/.hg/strip-backup/a692ab61aad8-backup.hg
@  changeset:   1:bb2e0cde315f
|  tag:         tip
|  user:        blob79
|  date:        Wed May 01 12:14:28 2013 +0200
|  summary:     collapse change sets
|
|  diff -r e31426a230b0 -r bb2e0cde315f b
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/b      Wed May 01 12:14:28 2013 +0200
|  @@ -0,0 +1,1 @@
|  +b
|  diff -r e31426a230b0 -r bb2e0cde315f c
|  --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
|  +++ b/c      Wed May 01 12:14:28 2013 +0200
|  @@ -0,0 +1,1 @@
|  +c
|
o  changeset:   0:e31426a230b0
   user:        blob79
   date:        Wed May 01 12:14:27 2013 +0200
   summary:     add a

   diff -r 000000000000 -r e31426a230b0 a
   --- /dev/null        Thu Jan 01 00:00:00 1970 +0000
   +++ b/a      Wed May 01 12:14:27 2013 +0200
   @@ -0,0 +1,1 @@
   +a

If you have to keep the change sets because they were already published, you can also keep the old change sets adding the new collapsed change set as a separate branch.

hg rebase --keep --collapse --source $ID_B --dest $ID_A -m"collapse change sets"
hg glog

@  changeset:   3:0fa31b1ebcf1
|  tag:         tip
|  parent:      0:448200734313
|  user:        blob79
|  date:        Wed May 01 12:14:28 2013 +0200
|  summary:     collapse change sets
|
| o  changeset:   2:0a6400bebe21
| |  user:        blob79
| |  date:        Wed May 01 12:14:28 2013 +0200
| |  summary:     add c
| |
| o  changeset:   1:a93228700dc1
|/   user:        blob79
|    date:        Wed May 01 12:14:28 2013 +0200
|    summary:     add b
|
o  changeset:   0:448200734313
   user:        blob79
   date:        Wed May 01 12:14:28 2013 +0200
   summary:     add a


hg log -r "::tip" --patch

changeset:   0:448200734313
user:        blob79
date:        Wed May 01 12:14:28 2013 +0200
summary:     add a

diff -r 000000000000 -r 448200734313 a
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/a Wed May 01 12:14:28 2013 +0200
@@ -0,0 +1,1 @@
+a

changeset:   3:0fa31b1ebcf1
tag:         tip
parent:      0:448200734313
user:        blob79
date:        Wed May 01 12:14:28 2013 +0200
summary:     collapse change sets

diff -r 448200734313 -r 0fa31b1ebcf1 b
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/b Wed May 01 12:14:28 2013 +0200
@@ -0,0 +1,1 @@
+b
diff -r 448200734313 -r 0fa31b1ebcf1 c
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/c Wed May 01 12:14:28 2013 +0200
@@ -0,0 +1,1 @@
+c

As I already mentioned you can use the rebase extension to move change sets around. A small exercise is to change the order of two change sets. With the same setup of three change sets as in the collapse example before, we move the changes from the initial change set $ID_C before the changes made in change set $ID_B.

hg rebase --source $ID_C --dest $ID_A
hg rebase --source $ID_B --dest tip

@  changeset:   2:673889b80c51
|  tag:         tip
|  user:        blob79
|  date:        Wed May 01 12:14:29 2013 +0200
|  files:       b
|  description:
|  add b
|
|
o  changeset:   1:175e0515be8b
|  user:        blob79
|  date:        Wed May 01 12:14:29 2013 +0200
|  files:       c
|  description:
|  add c
|
|
o  changeset:   0:16ef477c54cb
   user:        blob79
   date:        Wed May 01 12:14:29 2013 +0200
   files:       a
   description:
   add a


A nice exercise for the reader is to extend the example to fourth change set. This change set is the tip of the repository as a child of change set $ID_C. After the change set move it should be the child of the change set $ID_B.

In real world repositories changes are not fully independent, so while rebasing you have to resolve conflicts, but this another blog post for another day.

20130423

Mini-Quickcheck for Python

I wanted an implementation of a mini-Quickcheck in Python. This is the API I came up with. It is also a good way to see what’s at the heart of Quicheck: generators.

I cut every corner I could. Some methods are not random, but this can be easily fixed.

There is a runner decorator (dependent on the decorator library) than run’s test methods repeatedly.

import random

def oneof(*values):
    return random.choice(values)

def optional(param):
    return oneof(None, param)

def boolean():
    return oneof(True, False)

def integer(min=0,max=1<<30):
    return random.randint(min,max)

def char():
    return chr(integer(min=2,max=ord('z')))

def string(min=0):
    return "".join([char() for _ in xrange(integer(min=min, max=10))])

def nonempty_string():
    return string(min=1)

def substring(string):
    if not string:
           return string
    start = integer(0, len(string) - 1)
    end = start + integer(len(string) - start)
    return string[start:end]

def date():
    return datetime.date.today()

def subset(*vs):
    return [e for e in vs if boolean()]

def list(gen):
    return [gen() for _ in xrange(integer(max=5))]
    
@decorator
def runner(func, *args, **kwargs):
    for r in xrange(4):
        test_instance = args[0]

About

My blog did not have an about page for years. Now that blogs get out of fashion I can get one as well, as the chances that someone actually reads it diminishes.

Having an about page is nice for everybody involved. There is a nice personal touch to the site full of otherwise dry articles. This is also the space where a blogger can do some navel-gazing and bragging: a nice photo from a place I was you were not, I can tell you what a great company I work for and what an amazingly smart guy I am.

The problem is: I like to stay at home most of the time, I work for a normal company and I'm definitely not smart. (My best trait is productive laziness.)

If you read this far the one thing you should consider is becoming an organ donor. You won't mind. You may help somebody in a lot of trouble. At least I tried! Maybe you like to read a bit about organ donation...


(The fact that I wrote this after reading an about page is a mere coincidence. You should not ask somebody who they are. It's probably the worst question possible.)