## Tuesday, 25 May 2010

### Mini hash table

Imagine you're stuck on a desert island with only a few keystrokes and you desperately need to create your own rudimentary hash table. What might you do?

Well, here's one possible solution: a function that takes a sequence of key-value pairs and builds a hash table, returning another function that searches the hash table for the given key in order to find its corresponding value:

```> let hashtbl xs =
let spine = [|for _ in xs -> ResizeArray()|]
let f k = spine.[k.GetHashCode() % spine.Length]
Seq.iter (fun (k, v) -> (f k).Add (k, v)) xs
fun k -> Seq.pick (fun (k', v) -> if k=k' then Some v else None) (f k);;
val hashtbl : seq<'a * 'b> -> ('a -> 'b) when 'a : equality```

Here's how you might use it to look up some squares:

```> let tbl = hashtbl [for x in 0..100 -> x, x * x];;
val tbl : (int -> int)

> tbl 4;;
val it : int = 16

> tbl 12;;
val it : int = 144```

## Thursday, 20 May 2010

### Quicksort

The F#.NET Journal just published an article about sorting:

"The quicksort algorithm was invented by Tony Hoare in 1960 and remains one of the most celebrated algorithms and is still of great practical value. Implementing some of the many variations of the quicksort algorithm serves as an excellent introduction to mixed-paradigm programming in F# and the implementation of a production-quality implementation benefits enormously from the use of a substantial number of exotic features of the F# language. Moreover, the quicksort algorithm is amenable to parallelization, which is of increasing relevance in the multicore era, so the performance characteristics of parallelizations of the implementations are also of great interest..."

## Friday, 14 May 2010

### Java vs F#

Dr Cliff Click of Azul Systems, specialists in manycore JVM systems, recently published a blog post about the performance of Java compared primarily to C and C++ but also discussing C# and .NET. Three of Cliff's comments are of particular interest:

Under the heading "Places where C/C++ beats Java for obvious reasons":

"Value Types, such as a 'Complex' type require a full object in Java." - Dr Cliff Click

What Cliff forgot to mention is that .NET also provides value types and a far more compelling example than complex numbers is the humble hash table.

Consider the task of filling a hash table with 10,000,000 associations from integers to single-precision floating point numbers. This may be accomplished in Java as follows:

```package hashtablebenchmark;

import java.util.HashMap;

public class Main {
public static void main(String[] args) {
int n = 10000000;

for (int j=0; j<10; ++j) {
long startTime = System.currentTimeMillis();
HashMap hashtable = new HashMap(n);

for(int i=1; i<=n; ++i) {
hashtable.put(i, 1.0f / i);
}

System.out.println("m[100] = " + hashtable.get(100));
long time = System.currentTimeMillis() - startTime;
System.out.println("Took: " + time / 1e3 + "s");
}
}
}```

The equivalent program in F# is not only shorter but also 17× faster:

```let n = 10000000
let m = System.Collections.Generic.Dictionary(n)
for i=1 to n do
m.[i] <- 1.0f / float32 i
printf "m[100] = %f\n" m.[100]```

Specifically, Java takes 6.967s initially and 5.733s steady state whereas F# takes only 0.414s.

In fact, F# completes this benchmark so quickly that we would like to give it more work but Java cannot handle any more work without running out of memory on this 4Gb machine.

Elsewhere, Cliff also writes of Java:

"Very Good Multi-Threading Support. Parallel programming is just easier in Java." - Dr Cliff Click

and later:

"Not that I track C# all that closely but... I believe the JIT produces substantially slower code than Java" - Dr Cliff Click

Allow us to demonstrate the otherwise. The Computer Language Shootout contains a well-formed benchmark called spectral-norm for which the fastest Java solution is a 173-line parallel program. This implementation may be rewritten in only 24 lines of F#:

```let A i j = 1.0 / float((i + j) * (i + j + 1) / 2 + i + 1)

let inline mul A (u: _ []) (v: _ []) =
let mutable vi = 0.0
for j = 0 to v.Length - 1 do
vi <- vi + A i j * u.[j]
v.[i] <- vi) |> ignore

let AtAu u v =
let w = Array.create (Array.length u) 0.0
mul (fun i j -> A i j) u w
mul (fun i j -> A j i) w v

do
let n = 5500
let u, v = Array.create n 1.0, Array.create n 0.0
for i = 0 to 9 do
AtAu u v
AtAu v u
let u, v = vector u, vector v
printf "%0.9f\n" (sqrt(Vector.dot u v / Vector.dot v v))```

In the Java, dozens of lines of code were devoted to parallelism. In contrast, only two lines of code were altered to parallelize the F#. So it would seem that parallel programming is not "just easier in Java".

The the serial Java takes 12.722s initially and 12.299s steady state whereas F# takes only 12.18s from a cold start. On this 8-core 2xE5405 2.0GHz Xeon, the parallel Java takes 1.839s initially and 1.820s steady state whereas the parallel F# takes only 1.60s from a cold start. The fact that Java is slower in every single case indicates that the CLR's JIT is not "producing substantially slower code than Java".

Finally, Cliff made no mention of two other design deficiencies that cripple Java's performance. Firstly, their type erasure approach to generics incurs massive performance degradation for a lot of generic code because it leads to unnecessary boxing. Secondly, the JVM's lack of tail call elimination is not only a growing hindrance in the era of functional programming but the only general-purpose workaround, trampolines, is typically 10× slower that necessary.

If you want to learn how to write fast parallel programs in F#, read our book Visual F# 2010 for Technical Computing.

## Thursday, 6 May 2010

### Parallelizing the SciMark2 benchmark: part 2

The F#.NET Journal just published an article about parallel programming:

"The SciMark2 benchmark is one of the better benchmarks for programming languages and their implementations in the context of technical computing. This article is the second in a two-part series revisiting the SciMark2 benchmark to examine the parallelization of each of its component tasks. Specifically, the Monte-Carlo, sparse matrix-vector multiplication and LU decomposition tests. The results are of general interest in the context of improving performance on multicores using parallel programming..."