Find cruft with a new Mercurial extension

After some fun with the quick and untested shell script that finds the oldest code in a Subversion repository, it is the next step to write a Mercurial extension. The simple Mercurial extension cruft does basically the same job as the shell scripts for Subversion. Being an extension it’s nicely integrated into Mercurial as the other extensions.

Python and Mercurial are relatively easy to get into. Mercurial provides the Developer Info page which is really good. Additionally, there’s a guide how to write a Mercurial extension. The guide is good start for the Mecurial development. The rest can be easily picked up by reading the code of other commands and extensions.
The code is readable and there are no big hurdles.

The only thing I missed while writing the extension is type information in method signatures. As much as I like Python it’s ridiculous to write the type information in the pydoc and let the developer figure out the types. This one of the trade-offs you have to live with.

Testing Mercurial extensions

It suffices to understand the integration tests tool Mercurial uses to test the extension itself. There’s some documentation for this as well. The basic idea behind Cram is to start a process and check against the expected output.

The integration test tool defines a small language. All lines that have no indentation are comments. Indented lines starting with $ are executed and all other lines are the expected output. For example a test looks like this:


 $ hg init

 $ cat <<EOF >>a
 > c1
 > c2
 > EOF
 $ hg ci -A -m "commit 0"
 adding a


 $ hg cruft
 0 a c1
 0 a c2

First a repository is initialized: a file called a with the content (c1,c2) is committed and then the Mercurial is started with the cruft command. Without options the cruft extension prints all lines with newest lines first. The expected output is (0 a c1, 0 a c2) which is means the revision 0 file a and line c1; revision 0 file a and line c2.

It’s fairly easy to get started with this tool. The only downside in my tests is that the they reuse the same test fixture and do not reset the fixture for each test. They are not executed in isolation, which has a whole range of problems - redundancy and readability for example - but I didn’t feel that it was worth the effort to structure the tests otherwise.

Installing the extension

The easiest way to install the extension is to download the cruft.py to a local folder and add a link to the extension file in the .hgrc file.


Using the extension

After the installation you can execute pretty much the same commands as with the shell script version.

hg help cruft

hg cruft

(no help text available)


-l --limit VALUE   oldest lines taken into account
-c --changes       biggest change sets
-f --files         biggest changes per file
-X --filter VALUE  filter lines that match the regular expression
    --mq            operate on patch repository

use "hg -v help cruft" to show global options

I use here the quickcheck source code to show some sample output.

hg cruft -l 5 -X "^(\s*}\s*|\s*/.*|\s*[*].*|\s*|\s*@Override\s*|.*class.*|import.*|package.*)$" quickcheck-core/src/main
This finds the oldest 5 lines using the Java source code specific exclusion pattern (parentheses, imports, class definitions etc.) for the quickcheck-core/src/main folder. The output contains the revision number, source file and source code line.

5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java public Object[] next() {
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java ArrayList<Object> next = new ArrayList<Object>(generators.length);
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java for (Generator<?> gen : generators) {
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java next.add(gen.next());
5 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/TupleGenerator.java return next.toArray();
You can also find the biggest change sets for the last 500 lines.

hg cruft -X "^(\s*}\s*|\s*/.*|\s*[*].*|\s*|\s*@Override\s*|.*class.*|import.*
|package.*)$" -l 500 -c quickcheck-core/src/main
This prints the revisions number, number of changed lines and commit comment of the change set.

49 41 removed getClassification method from Property interface
moved Classification into quickcheck.property package
177 43 MutationGenerator, CloningMutationGenerator and CloningGenerator added
139 50 fixed generic var arg array problems
5 53 initial check in

Finally, you can find the files with the most lines changed by a single change set (again with the filter and for the 500 oldest lines).

hg cruft -X "^(\s*}\s*|\s*/.*|\s*[*].*|\s*|\s*@Override\s*|.*class.*|import.*
|package.*)$" -l 500 -f quickcheck-core/src/main
This prints the revision number, file name, number of changes and change set commit comment.

176 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/AbstractTreeGenerator.java 27 added tree generator
177 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/CloningGenerator.java 28 MutationGenerator, CloningMutationGenerator and CloningGenerator added
139 quickcheck-core/src/main/java/net/java/quickcheck/generator/support/DefaultFrequencyGenerator.java 36 fixed generic var arg array problems
49 quickcheck-core/src/main/java/net/java/quickcheck/characteristic/Classification.java 41 removed getClassification method from Property interface
moved Classification into quickcheck.property package


Developing a Mercurial extension is relatively easy given Python, the good Mercurial documentation, the good readability of the code and integration test tool. If you’re using Mercurial you should give Mercurial extension development a try. I’ve only recently read into Python again so this is the Python beginner’s version of a Mercurial extension. Help to improve the implementation is always appreciated.

Learning Python and seeing how things are implemented there is fun. Looking at the PEPs and the associated process, they feel much more accessible and open than JSRs. The PEPs are also a track record of the advances the language makes and problems it tries to solve one after the other. There’s stuff in Python that you’ll probably never see in Java like the generator expressions. Everyone who had to replace an internal loop with an iterator will understand that this is not a toy. The language features seem to sum up quite nicely and result in a productive environment. As always, some things are unfamiliar or missing but there’s no perfect platform.