Saturday, 9 December 2006

Concurrency (continued)

Well, I've translated about 15kLOC of OCaml/OpenGL to F#/DirectX now: my Smoke vector graphics engine. The translation has been amazingly painless, especially considering the commercial benefits of opening the Windows market.

Smoke's basic design revolves around a scene graph that appears to be purely functional to the user but actually caches as much as possible to improve performance (this is why Smoke is 10x faster than Microsoft's Avalon, Adobe's SVG viewer and Macromedia's flash for complex scenes). I've skipped most of this translation because DirectX lacks a lot of vital functionality compared to OpenGL, specifically display lists (that transparently compile static geometry and upload it to the graphics card) and a tesselator (that converts winding-rule polygons into triangles).

I spent two weeks tracking down a really annoying bug. This bug manifested itself as access violations from Managed DirectX (not supposed to be possible!), corrupted IL (also not supposed to be possible) and weird-looking corrupted renderings. I originally blamed the problem on Windows Forms not being thread safe. Having scoured the docs for information on that I started to blame Managed DirectX.

As it happens, the problem was caused by an innocuous-looking closure in my F# code. I had replaced a cached OpenGL display list with a cached closure that would render a geometry. However, different visualisation windows can share the same scene graph and, particularly, the same caches. And this cached closure was caching the DirectX device that it was rendering to. Consequently, the OnPaint method of one window, running in one thread, was trying to write to a device that belonged to another thread. Hence the corruption. It seems that DirectX doesn't check for this and, consequently, it was sometimes resulting in memory corruption.

Aside from this one annoying bug, the ability to perform computations concurrently in F# is nothing short of miraculous. My immutable OCaml code runs concurrently with no changes. I only used immutable data structures four times in 15kLOC of code. Once I had wrapped these with locks, to prevent different threads from accessing them simultaneously, the entire 15kLOC was ready to exploit my dual core system. Sure enough, performance is almost doubled!

There's no way this could have been done in two weeks had I been using C++ or C#...

I'll upload a demo showing the graphics capabilities of my libraries ASAP. In the mean time: happy hacking!


Monday, 4 December 2006

Windows forms and Threads

I've been trying to get the vector graphics renderer from Presenta to play nice with the F# interactive mode. I only just discovered the problem and, in fact, I think it is the same problem that wreaks havoc with all Windows Forms applications when run from the interactive mode (including the bundled F# samples).

The problem arises when you try to spawn forms and continue using the interactive mode. The UIs are then executing on the same thread as the interactive mode itself. This would be fine provided everyone played nice and yielded every now and again using Application.DoEvents() but the interactive mode blocks waiting for user input. If you're playing with one of the F# samples from the interactive mode and its GUI freezes then try hitting return in the interactive mode to make the GUI responsive again.

Anyway, I tried to fix this in various ways, spawning threads, and had no luck... until today. It turns out that the fix is really easy. The fundamental problem is that Windows Forms are not thread safe. They use a "message loop" to convey things like the need for repaint and this whole loop must execute on the UI thread and not in any other thread. You can invoke messages from other threads but, if you just want to spawn a background window (e.g. for visualisation) and don't care what happens to it, there is no need to worry about invoking messages.

So all you do to create a completely separate "application" is spawn a thread, create your form and apply Application.Run(form) to it. The call to Application.Run sets up a completely new message loop for that thread. I've been playing with it this afternoon and it seems to work perfectly. I can spawn multiple windows visualising different data from the interactive mode and they all run beatifully. Moreover, they can take advantage of my dual core because they are running in separate threads.

One final discovery: there is a thread.IsBackground boolean property that is false by default. All background threads are killed when the last foreground thread dies. For example, if you have a GUI thread and a worker thread then you can make the GUI thread the foreground thread (the default setting) and set IsBackground in the worker thread to true so that it is killed when the GUI is closed. I've been bitten by not doing this during development, when a process hangs around after I believed I had finished it and I can't then rebuild because the EXE is in use.

Anyway, I'll be putting lots of essential bits of information like this in my book "F# for Scientists"...

Sunday, 3 December 2006

Array manipulation (continued)

I've been studying the slice-based approach to array manipulation used by the Python and Matlab implementations of a 1D, 4-tap discrete wavelet transform. I am particularly interested in this because of the idea of incorporating Matlab-like array slicing in future versions of F#.

Here is the Python code (written by Sturla Molden):

def D4_Transform(x, s1, d1, d2):
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]
s1[:-1] = even[:-1] + C2*d1[:-1] + C3*d1[1:]
s1[-1] = even[-1] + C2*d1[-1] + C3*d1[0]
d2[0] = d1[0] + s1[-1]
d2[1:] = d1[1:] + s1[:-1]
even[:] = C4 * s1[:]
odd[:] = C5 * d2[:]
if x.size > 2:

The discrete wavelet transform computes downsampled convolutions repeatedly until no data remains. My F# implementation of this transform is:

let rec d4_aux a n =
let n2 = n lsr 1 in
let tmp = Array.make n 0. in
for i=0 to n2-2 do
tmp.(i) <- a.(i*2)*.h0 +. a.(i*2+1)*.h1 +. a.(i*2+2)*.h2 +. a.(i*2+3)*.h3;
tmp.(i+n2) <- a.(i*2)*.g0 +. a.(i*2+1)*.g1 +. a.(i*2+2)*.g2 +. a.(i*2+3)*.g3;
tmp.(n2-1) <- a.(n-2)*.h0 +. a.(n-1)*.h1 +. a.(0)*.h2 +. a.(1)*.h3;
tmp.(2*n2-1) <- a.(n-2)*.g0 +. a.(n-1)*.g1 +. a.(0)*.g2 +. a.(1)*.g3;
Array.blit tmp 0 a 0 n;
if n > 4 then d4_aux a (n lsr 1)

Firstly, this could be simplified by using a new type to represent a cyclic array. However, unless the modulo indexing was hoisted from loops over the array, this would adversely affect performance.

Note that the F# is both shorter and faster than the Python. In fact, the F# is as fast as C++!

The next thing to note is that the the Python code doesn't use slicing because it is a good idea in general but, rather, because it evades touching the arrays element-wise (which would be very slow in an interpreted language like Python). So, are slices useful?

My F# program could be written more succinctly using slices:

let rec d4_aux tmp a n =
let n2 = n lsr 1 in
for i=0 to n2-2 do
tmp.(i) <- dot h a.[i*2:];
tmp.(i+n2) <- dot g a.[i*2+1:];
Array.blit tmp 0 a 0 n;
if n > 4 then d4_aux a (n lsr 1)

Where a.[i:] denotes a kind of IEnumerable starting at index "i".

This has saved quite a bit of code but optimising this back to the original is quite difficult.

So I think slices would be useful in F#. However, I think they will be much more useful if they are part of a generic framework for handling random-access containers, like IList, such that you can take slices of slices, or slices of random-access containers that aren't arrays (e.g. functional arrays implemented as balanced binary trees).

On a related note, I'd like an equivalent of OCaml's stream parsing syntax extension. This could be done using the upcoming feature "active patterns" to dissect an IEnumerable (now called Seq).

In summary, I think slices will be useful in F# but I don't think they will be as prevelant in F# as they are in languages like Matlab and Python because F# already provides very fast element-wise access.

Saturday, 2 December 2006

Array manipulation

I was just browsing comp.lang.python for the first time and this post caught my eye:

The post gives Matlab and Python code (using numpy) to perform a 1D discrete wavelet transform using the Daubechies D4 wavelet (which is one up from the Haar wavelet).

To a functional programmer, this implementation would benefit from deforesting or lazy evaluation. Specifically, this approach eagerly generates lots of unnecessary arrays. The reason is that this approach is faster in Matlab and Python (and other language with slow interpreters and comparatively-fast whole-array operators, like Mathematica) but it is much slower in compiled general-purpose languages like C, C++, Java, OCaml, SML, Haskell and F#.

The essence of optimising such programs in languages like F# is very interesting. You take the definition of a new array, like "even":

let even = Array.init (n/2) (fun i -> x.(2 * i + 1))

and you replace it with a closure:

let even i = x.(2 * i + 1)

Not only is the latter more succinct, it also evades the allocation and filling of a whole array by representing the array as an implicit data structure. This is a deforesting optimisation because it avoids unnecessary allocation and is actually a simple version of the optimisation that I used to improve upon the performance of Numerical Recipies' simulated annealing solution to the travelling salesman problem in my book OCaml for Scientists.

In F#, this will be lightning fast but in interpreted languages like Matlab and Python, indirecting through a closure invokes the heavy machinery of the general purpose interpreter and, consequently, is many times slower. I've done the test and my OCaml implementation is half as long as the Python and 3x faster. F# will give similar results, of course.

I believe that a lazily evaluated F# extension to IList would be very valuable, just as and filter are valuable. It could include common functions like "even" and "odd" (which fetch every other element in an IList subtype, e.g. an array). The discrete wavelet transform could then be written very succinctly and elegantly and would serve as an excellent example of the value of real functional programming.

Raytracer posted on Reddit

There's a discussion about my raytracer and F# in general following a post about my ray tracer on Reddit: