Saturday, 2 December 2006

Array manipulation

I was just browsing comp.lang.python for the first time and this post caught my eye:

The post gives Matlab and Python code (using numpy) to perform a 1D discrete wavelet transform using the Daubechies D4 wavelet (which is one up from the Haar wavelet).

To a functional programmer, this implementation would benefit from deforesting or lazy evaluation. Specifically, this approach eagerly generates lots of unnecessary arrays. The reason is that this approach is faster in Matlab and Python (and other language with slow interpreters and comparatively-fast whole-array operators, like Mathematica) but it is much slower in compiled general-purpose languages like C, C++, Java, OCaml, SML, Haskell and F#.

The essence of optimising such programs in languages like F# is very interesting. You take the definition of a new array, like "even":

let even = Array.init (n/2) (fun i -> x.(2 * i + 1))

and you replace it with a closure:

let even i = x.(2 * i + 1)

Not only is the latter more succinct, it also evades the allocation and filling of a whole array by representing the array as an implicit data structure. This is a deforesting optimisation because it avoids unnecessary allocation and is actually a simple version of the optimisation that I used to improve upon the performance of Numerical Recipies' simulated annealing solution to the travelling salesman problem in my book OCaml for Scientists.

In F#, this will be lightning fast but in interpreted languages like Matlab and Python, indirecting through a closure invokes the heavy machinery of the general purpose interpreter and, consequently, is many times slower. I've done the test and my OCaml implementation is half as long as the Python and 3x faster. F# will give similar results, of course.

I believe that a lazily evaluated F# extension to IList would be very valuable, just as IEnumerable.map and filter are valuable. It could include common functions like "even" and "odd" (which fetch every other element in an IList subtype, e.g. an array). The discrete wavelet transform could then be written very succinctly and elegantly and would serve as an excellent example of the value of real functional programming.