Friday, 25 May 2012

Improving matrix-matrix multiply and prime number programs in F#

Several people have cited the comparison of Erlang, F#, Go and Java from here. Let us present some improved F# solutions.
The parallel matrix-matrix multiply makes heavy use of async for parallel programming, where it is an anti-pattern. So it may be productively rewritten using Tasks as follows:

let randomMatrix =
  let rnd = new System.Random()
  Array2D.init 500 500 (fun _ _ -> rnd.NextDouble())
let matrixMultiply (a:float[,]) (b:float[,]) =
  let rowsA, colsA = Array2D.length1 a, Array2D.length2 a
  let rowsB, colsB = Array2D.length1 b, Array2D.length2 b
  let result = Array2D.create rowsA colsB 0.0
  System.Threading.Tasks.Parallel.For(0, rowsA, fun i ->
    for j in 0 .. colsB - 1 do
      let mutable x = 0.0
      for k in 0 .. colsA - 1 do
        x <- x + a.[i,k] * b.[k,j]
      result.[i,j] <- x
  ) |> ignore
matrixMultiply randomMatrix randomMatrix

This is 60x faster than the original async-based solution. However, this is a poor choice of algorithm because it lacks locality. Better algorithms are tiled or cache oblivious.
Their prime number generator makes heavy use of enumerable sequences which can be productively rewritten using recursion and arrays instead, as follows:

(* A very naive prime number detector *)
let isPrime (n:int) =
  let rec loop i = i*i > n || (n % i <> 0 && loop (i+1))
  loop 2
(* Return primes between m and n using multiple threads *)
let primes m n =
  Array.Parallel.init (n-m+1) (fun i -> m+i, isPrime (m+i))
  |> Array.Parallel.choose (function n, true -> Some n | _ -> None)
(* Run a test *)
while true do
  let timer = System.Diagnostics.Stopwatch.StartNew()
  primes 1000000000 1000100000
  |> ignore
  printfn "%fs" timer.Elapsed.TotalSeconds

This is 13x faster than their original seq-based solution.


David said...

Just surprised to see that the F# compiler properly optimizes the recursive loop as it doesn't look like a tail call at first glance.

Jon Harrop said...

@David: That recursive call is actually in tail position because the boolean operators are short-circuit evaluated.

So the code "i*i > n || (n % i <> 0 && loop (i+1))" is equivalent to "if i*i>n then true else if n%1 <> 0 then loop (i+1) else false".