20100118

Why overrideable tail recursive methods must not be optimized

Suppose you define a tail recursive method x in a object A. You can use the @tailrec annotation to mark methods you expect optimization of the tail recursive code to happen.
import annotation._
object A{
    @tailrec def x(i : Int, s : String) : String = if(i > 0) x(i - 1, s + "a:" + i) else s
}
Now you move the method x to a trait or class and suddenly the compilation fails.
import annotation._
trait A{
    @tailrec def x(i : Int, s : String) : String = if(i > 0) x(i - 1, s + "a:" + i) else s
}
:8: error: could not optimize @tailrec annotated method
            @tailrec def x(i : Int, s : String) : String = if(i > 0) x(i - 1, s + "a:" + i) else s
I had exactly this problem. I defined a perfectly valid tail recursive function but the compiler would not optimize it to a loop. This was in Scala 2.7 that does not support the @tailrec annotation. You have the additional issue of spotting the problem in the first place.

The problem is that overrideable tail recursive methods won't be optimized. If you change the code to
import annotation._
trait A{
    @tailrec final def x(i : Int, s : String) : String = if(i > 0) x(i - 1, s + "a:" + i) else s
}
it will compile again. This is the rule. But why?

Performance optimization must not change the behavior at run-time. So overriding an optimized tail recursive method will behave differently from an unoptimized tail recursive method.

Let's define A in a way that it won't be optimized and override the tail recursive method x in the subclass B.
class A{
    def x(i : Int, s : String) : String = if(i > 0) x(i - 1, s + "a:" + i) else s
}

class B extends A{
    override def x(i : Int, s: String) : String = super.x(i, s + "b:")
}
This will produce the output:
scala> new A().x(3, "")
res0: String = a:3a:2a:1

scala> new B().x(3, "")
res1: String = b:a:3b:a:2b:a:1b:
The implementation of the method x in B invokes the method x in A. When the method x defined in A invokes the method x it calls the method x in B (for an instance of B). This behavior is due to late binding of calls to x.

Suppose it was possible to optimize the method x in A. The compiler would define a class similar to this definition of A. The recursive invocation of x is optimized to a loop. With this definition of x there is no possibility to apply late binding to x. The optimization would hard wire the definition of x.
class A{
    def x(i : Int, s : String) : String = {
        var j = i
        var t = s
        while(j > 0){
            t = t + "a:" + j
            j = j - 1
        }
        t
    }
}

class B extends A{
    override def x(i : Int, s: String) : String = super.x(i, "b:" + s)
}
scala> new A().x(3, "")
res0: String = a:3a:2a:1

scala> new B().x(3, "")
res1: String = b:a:3a:2a:1
The early binding of x in A changes the behavior of B. Once the method x in A is called it will never come back to B. This is the reason why the Scala compiler can't optimize overrideable tail recursive methods.

Following this explanation, it should be clear that the @tailrec annotation is worthwhile and should be used in all your Scala 2.8 code you expect the optimization to happen. For Scala 2.7 it's a bit unfortunate that moving a tail recursive method from an object to trait will change the behavior significantly without a compiler error. So be warned.

20100111

Lessons from Stackoverflow: missing education should not stop you from being a good developer

To put this blog into perspective: I've the weakest of all possible computer science degrees you can imagine. It's formally a degree in computer science but half of my curriculum was in communication engineering which is an interesting field (if you want to build a digital hearing aid for your grandma) but does not really help with software development.

I do appreciate the work of Jeff Atwood at Stackoverflow. It's a great site. It is a viable programming resource. Judging based on his results he is a good developer.

Ironically you can build a good product without been a topnotch computer scientist. This is really encouraging for me.

I'd been reading Jeff Atwood's blog for more than two years because it was funny, entertaining, presented a down-to-earth view of IT and provided insights of a different community (.net and Windows) from my Java community. But around the time he started Stackoverflow I quit reading his blog. I cannot say exactly why, but I lost its appeal a bit. I liked to read more computer science related stuff (mostly from the functional programming world). Roughly one year later I'm back again. I really like Stackoverflow.

But time and again listening to the Stackoverflow podcasts makes me wondering how's that possible. How could he write Stackoverlow? In the latest Stackoverflow podcast #79 Joel Spolsky tried to explain that you can parse only regular languages with regexps parser. (Actually, most implementations allow more than that. For example back referencing is more powerful than regular expressions.) This explanation goes on and on. This was like a déjà vu when he tried to explain Jeff and Scott Hanselman Hindley-Milner type inference algorithm used by the Haskell compiler. (He tried to explain that you can deduce the return type of a method from the parameters and the functions called. Let's say he did not get too far.)

Jeff explains why it's great to have his first open-source project. How many bugs he fixed and that he is now at the point where the hairy problems start. Few people seem to be able to help fixing the problems in the C# port of the PHP and Perl implementations. Joel explains again and again that you need a lexer and parser generator to parse Markdown and every student with a parser course can tell you that this is right tool for this kind of problem. That a transformation from an AST is a piece of cake. I never had a parser course so don't expect a scientific explanation here, but I was at least able to parse a subset of SQL with Parser Combinators. The moment Jeff started to explain his open source project. That the parsing is done with regular expressions I though: "Hell, why didn't he use a lexer/parser for that."

But you can learn something from Jeff Atwood's example: Your meager education should not stop you from being a good developer. We can't all have ivy league education. (They wouldn't be ivy league if everyone would get a degree. The system does only work because it discriminates between students.) Let's get over it.  Put your enthusiasm in your work and learn the bits and pieces you have to learn to get the jobs done. Try to constantly improve your theoretical knowledge, your tool knowledge, quality of your code and your communication skills. Maintain your critical thinking to spot ways you can improve. You can't known everything but you should known what you don't known:

There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.

The known unknowns will slow you down, but only the unknown unknowns can hurt you. These are the places where you put to much effort into a problem that is already solved like parsing. But a star maverick developer can only develop so much. There will be times when a Duct Tape Developer is needed and the pure theory boys won't get the job done. This is the one thing you can learn from the Stackoverflow story: It's the motivation, stupid.

By the way, if you have doubts about your abilities, this is probably a good sign.

20100107

How to setup a Maven Scala project with IntelliJ IDEA support

Setting up a Maven and Scala project with IDEA is simple.

Install IntelliJ IDEA Community Edition and the Scala plugin (File -> Settings -> Plugins -> Available).

Generate your maven project using the simple Scala archetype:

mvn org.apache.maven.plugins:maven-archetype-plugin:1.0-alpha-7:create 
    -DarchetypeGroupId=org.scala-tools.archetypes  
    -DarchetypeArtifactId=scala-archetype-simple
    -DarchetypeVersion=1.2
    -DremoteRepositories=http://scala-tools.org/repo-releases
    -DgroupId=group
    -DartifactId=artifact

Add the Scala compiler lib to your project dependencies in the pom.xml file:
<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>${scala.version}</version>
        <scope>compile</scope>
    </dependency>
 </dependencies>        

Update the Scala version configuration to a recent release:
<properties>
    <scala.version>2.7.7</scala.version>
</properties>
Open the a new project. Select the pom.xml in the file dialog.

An notification will pop up. Click "Create Scala Facet".

Done.

You can now compile and execute as usual and run the maven build mvn install with the Maven Tool Window. (The M2_HOME environment variable or Maven home settings (File -> Settings -> Maven -> Maven home directory) have to be present.)

20100106

The best Scala IDE: IntelliJ IDEA

After years and years of using Eclipse (after Netbeans, JBuilder, Visual Cafe, Emacs and other spooky Java "editors") IntelliJ IDEA is powerful, convenient and simple to use IDE for Scala development. Oh, yes it's free: IntelliJ IDEA Community edition.

I tried IDEA because the Scala plug-in for Eclipse is so horrible. I works so badly that using good old jEdit with some macros is way better. On a small project I had to clean the whole project for every compile. It's so bad. After several tries (starting sometime in 2008) I've given up on this thing.

Before IDEA I also tried to use Netbeans, but it failed the dump-developer-that-does-not-read-the-manual-test. Basic functionality should be usable without a detailed study of the documentation. That worked for me with IDEA but I could not get the plugin Scala plugin to run in Netbeans. It's probably me and everything is fine with Netbeans. Still I've invested less time to get things done with IDEA. It simply worked.

The simple summary: every basic IDE feature works: syntax high lighting, compiling, debugging, navigation and outline.

If you worried about productivity of Scala development because it was missing IDE support: stop worrying. The support is here.

I'm a heavy short cut user so using a new IDE slows me down naturally. Using the Eclipse key map was not an option for me. This works only 80% of the time. The main problem is that functionality with the same shortcuts behaves slightly different. That's maddening at the long run. So it's better to learn a new set of short cuts. It's like driving in Great Britain. It's easier when you are using a car with the steering wheel on the right. Driving a car with the steering wheel on the left is dangerous. There are the country side hedges where you'll see practically nothing in a left turn and chances are that you're going to start your trip on the right hand side on a calm country road. So better relearn.

Interestingly IDEA has a code analysis tool that does spell checking in comments, methods and variables names. This is already works for Scala. Eclipse does spell checking only in comments. This is a boon for code quality. You do not want to read incorrectly spelled source code. This is plain annoying to read, fix and frankly to get right (camel case does not make it easier). (The code analysis tool for Java is very good. It finds tons and tons of problems after FindBugs. I have not heard about the analysis tool before. It would be nice if it could be integrated in build tools and used in a continuous integration environments.)

The maven support seems to work out of the box. It reads a pom.xml and builds IDEA project files from it. It still uses its internal build infrastructure. With Eclipse you have to manually generate the .project and .classpath files which works but is an extra step and the eclipse specific configuration in pom files can get quite clunky.

The essential things work for Scala. You have code completion without implicit type conversion information. I suppose the support for implicit type conversion is quite hard to implement. There are a lot of callable members with type conversion for a given object. I suppose the best way to implement this would be a 2-step process. First show all directly reachable members and after hitting Ctrl + Shift again show all reachable members. Methods parameters are missing for decompiled Scala classes and as code completion information.

Besides these minor problems IntelliJ IDEA is a wonderful tool to develop Scala with and it does not crash. Go and try it yourself.

20100102

Quickcheck 0.5 release

It's done: Quickcheck version 0.5 is ready. Again one of the nice little releases of Quickcheck.

The 0.5 release added the following features mainly to provide better readability:
  • Support for an adapter from Generator to Iterable to allow the use of Java's for-expression in tests (replacing the inner classes).
  • Generated source code for @Iterables and @Samples annotations on generator factories. The @Iterables annotation was added to support for-expression directly with Generators without additional code. The @Samples annotation can be used to generate a factory for sample values from a Generator class.

Minor changes are:

The version 0.6 of Quickcheck will be a major rework after more than 2 years of experience with the specification-based test approach and development. Mainly to support further development for example shrinking (simplification) of failed test values. A major feature still missing in Quickcheck version 0.5.

When I started Quickcheck I would not have thought that it is possible to develop such a tiny little thing for more than 2 years. Still more ideas and things that could be implemented and written about and so little time.

Recently, I've used the 0.5 version in my Scala projects I blogged about. I did not mention that I tested it in any way, though. It was a pleasant experience given that QuickCheck was not designed with Scala's language features in mind. It works with just a small implicit type conversion and adaption layer (10 lines or so).

Distribution function implementations based on the XorShift random number generator I've already worked with would be nice to have. When I'm at adding new distributions anyway, there are still the distribution functions based on heuristics to implement. For example a distribution function based on a boundary heuristic that starts production of values with minimum value, -epsilon, 0, +epsilon and maximum value before random generation kicks in. Another item on my ever growing list is the description of patterns related to specification-based testing and QuickCheck. Obviously there is still plenty of work left.