Monday, 9 April 2007

Exploiting concurrency

As a functional programming language, F# provides many higher-order functions for manipulating data structures. The "map" function is one of the most important ones. The map function takes two arguments, a function and a data structure, and returns a new data structure of the same kind where each element is the result of applying the given function to the corresponding element in the input data structure.

For example, map f [1; 2; 3] computes [f 1; f 2; f 3]. Functions that manipulate other functions in this way (higher-order functions, curried functions and combinators) are the essence of functional programming.

An interesting feature of the "map" function is that the applications of the given function "f" can be done concurrently (in parallel). When the function "f" is slow (takes longer than a microsecond), or when it is to be applied many times (such that the whole map takes over a millisecond), a concurrent implementation of the map function can make efficient use of multiple CPUs and significantly outperform the sequential map function.

The F# standard library does not yet provide concurrent versions of such higher-order functions. This is partly due to the fact that the seemingly simple task of writing such a function is made very complicated by the large number of trade-offs involved in the design of the map function.

Whilst writing F# for Scientists, I have discovered that there is a very significant middle-ground that provides a map function with sufficiently low overhead that it is useful in a wide variety of situations without having to resort to dangerous optimizations that can result in deadlocks (e.g. the use of a global thread pool) and race conditions. You'll have to read the book to learn how a more sophisticated "map" function can be implemented!

The parallel map function is most simply written using the asynchronous invocation of a delegate. This is a slightly alien concept for F# programmers. A delegate is just the .NET name for a closure and delegates provide methods called BeginInvoke and EndInvoke that allow them to be executed concurrently. Moreover, this approach handles both the collection of results from the invocation and any exceptions that might be raised during their execution.

To make the asynchronous interface for a closure available, it must be converted into a delegate. This can be done using the System.Converter class.

The following implementation of a parallel map uses this approach:

let global_map f a =
let d = new System.Converter<'a, 'b>(f)
Array.map (fun x -> d.BeginInvoke(x, null, null)) a >
Array.map (fun a -> d.EndInvoke(a))

Although simple, this map function can be used to distribute computations among CPUs. However, there are many problems with this function. Firstly, the asynchronous invocation relies upon the global .NET thread pool. As the worker threads are not local to each application of the "map" function, they can be deadlocked if this map function is used recursively (i.e. if "f" calls "map"). Secondly, there is considerable overhead in queueing function applications on the global thread pool. So the performance of this function can be greatly improved upon by using a different approach (the "map" function described in F# for Scientists is up to 100x faster).

Despite its shortcomings, this parallel map function should be enough to get you leveraging parallelism and exploiting those dual cores that you paid so much for!

For keen readers wishing to learn more about threading on the .NET platform, I highly recommend Joseph Albahari's mini e-book Threading in C#.

1 comment:

Anonymous said...

Do you people have a facebook fan page? I looked for one on twitter but could not discover one, I would really like to become a fan!