Sunday, 25 November 2012

Improving grid performance

The F#.NET Journal just published an article about WPF:
"Many enterprise software applications require the ability to visualize a grid quickly and easily. This article looks at the performance of using the built-in WPF Grid control for a modest 80x120 grid of numbers and describes the design and implementation of a bespoke FastGrid control that uses WPF's low-level rendering APIs to make startup 16x faster..."
To read this article and more, subscribe to The F#.NET Journal today!

Saturday, 24 November 2012

Basic use of LLVM from F#

The F#.NET Journal just published an article about metaprogramming:
"The low-level virtual machine (LLVM) is a freely-available high quality code generation library with support for many different architectures and platforms including x86, x64 and ARM. Programs are conveyed to LLVM in its intermediate representation (IR) which is an infinite register virtual machine. LLVM supports both static and JIT compilation of IR to native code. This article describes a compiler written in F# that can compile programs written in a small language to LLVM IR (using PInvoke) in order to evaluate them interactively..."
To read this article and more, subscribe to The F#.NET Journal today!

Sunday, 11 November 2012

The F# Software Foundation (FSSF)

Several members of the F# community have combined forces to found The F# Software Foundation (FSSF) with the intention to "promote, protect, and advance the F# programming language, and to support and facilitate the growth of a diverse and international community of F# programmers". Note that Microsoft are still entirely responsible for F# on its native platform: .NET on Windows.

Although the FSSF was only just created, their website is already a gold mine of F#-related links. The "Getting F#" section refers to explanations of how to install F# on Mac, Linux, Windows, Android and iOS as well as HTML5 and GPGPU.

We fully support this movement to encourage wider adoption of F# and, in fact, we are currently experimenting with F# on the Raspberry Pi and intend to look at F# on the Google Nexus 7 tablet and the forthcoming Microsoft Surface Pro tablet.

Wednesday, 7 November 2012

SAT Solver

The F#.NET Journal just published an article about logic programming:
"Satisfiability solvers or SAT solvers are programs than solve logical problems. A boolean SAT solver is one than solves the problem of finding sets of variable bindings for which a given boolean expression evaluates to true..."
To read this article and more, subscribe to The F#.NET Journal today!

Sunday, 14 October 2012

Numerical integration

The F#.NET Journal just published an article about numerical methods:
"Definite integrals can be calculated approximately using a variety of techniques known collectively as numerical integration algorithms. The ability to calculate definite integrals has many practical applications including solving ordinary differential equations. This article describes several basic algorithms for numerical integration and a technique that allows these algorithms to be applied to both finite and infinite limits..."
To read this article and more, subscribe to The F#.NET Journal today!

Wednesday, 10 October 2012

Simulating a concurrent system

The F#.NET Journal just published an article about concurrent programming:
"Distributed concurrency can be complicated and, consequently, there can be great value in having a complete working prototype to study. The simplicity of asynchronous agents in F# makes them ideal for writing simulation code. This article describes the design of a simple concurrent system that distributes incoming messages between two separate concurrent systems in order to duplicate subsequent processing in order to provide fault tolerance..."
To read this article and more, subscribe to The F#.NET Journal today!

Wednesday, 5 September 2012

Structural typing

The F#.NET Journal just published an article about reflection:
"Static type systems enforce constraints at compile time by using types as contracts that must be adhered to. Nominal typing compares types according to the explicit interfaces they implement. If two type explicitly implement the same interface then they can be used interchangeably within the context of that interface. Structural typing is an alternative to nominal typing where the comparison of types is based upon the structure of the types rather than explicit interfaces..."
To read this article and more, subscribe to The F#.NET Journal today!

Case study: porting a commercial compiler from OCaml to F#

This post presents one of our recent pieces of consultancy work as a case study. This case is interesting because it concerns a significant commercial code base ported from OCaml to F#, the introduction of parallelism to the code base and the performance of F# compared to OCaml in the context of a compiler (where OCaml is regarded as being very fast).

We recently ported a compiler written in OCaml to the F# programming language for a client and performance turned out to be an issue. The precise details cannot be disclosed under our contract.

The original compiler was 15,000 lines of OCaml code. The compiler was regularly applied to large code bases and the C code that it produces can be considerable so compilation can take minutes. Consequently, the compiler's performance is valued by our client's customers and, therefore, the original code had been optimized for OCaml's performance characteristics.

A direct translation of the OCaml code to F# proved to be over 10× slower. This was so slow that it impeded testing of the initial translation so some optimization was done before the first working version. Specifically, profiling indicated that the biggest problem was the high rate of exceptions being raised and caught. Exceptions are around 600× slower on .NET than in OCaml and are often used for ordinary control flow in OCaml so direct translations to F# often suffer from poor performance as a consequence. In this case, the hot paths were changed to use union types (usually option types) instead of exceptions. Although this incurs a lot of unnecessary boxing in F# the performance improvements were substantial and the F# version became 5x slower than the OCaml.

Surprisingly, thorough testing showed that our almost-blind initial translation of 15,000 lines of code was completely error free. I think this is a real testament to the power of ML's static type system. In fact, the only error found to date was introduced during subsequent optimization.

After demonstrating the correctness of the translation, effort turned to trying to improve performance in an attempt to compete with the original OCaml code. We expected this to be extremely difficult because OCaml is very heavily optimized for exactly this kind of workload. However, we were actually able to exceed OCaml's performance. This was accomplished by taking a regimented approach where the program was profiled to identify the main hot spots and actions were taken to improve overall performance but only the most effective optimizations were retained before repeating the process and all results (including failures) were carefully recorded. This allowed us to produce a fast code base that was still as familiar as possible to its developers. In particular, the support for parallel programming in .NET 4 and F# 2.0 proved invaluable and was the final boost we needed to exceed OCaml's performance.

This case study proves that it is possible to migrate significant commercial code bases from OCaml to F# without suffering performance degradation. Beyond metaprogramming, we have found that many practical applications see substantial speed improvements when moved from OCaml to F# often due to better support for multicore but also better support for unboxing.

Saturday, 18 August 2012

Tic-tac-toe revisited

The F#.NET Journal just published an article about concurrent GUI programming:
"This article rearchitects a GUI application from a previous article to use an asynchronous concurrent design. The result is a clean separation between view and model that closely resembles a client-server architecture due to the use of explicit message passing. This design pattern is generally applicable..."
To read this article and more, subscribe to The F#.NET Journal today!

Thursday, 2 August 2012

Cancellable streams

The F#.NET Journal just published an article about concurrency:
"Applications such as searching and visualization that request information from servers often have the potential to overload the server by sending too many requests before the handling of existing requests has been completed. A simple and effective solution to this problem is cancellation. This article examines a simple system of two communicating sequential processes that use cancellation to implement a server that accepts requests, returns responses and uses cancellation to tear down outstanding requests in order to avoid system overload..."
To read this article and more, subscribe to The F#.NET Journal today!

Sunday, 29 July 2012

Asynchronous bounded queues

The F#.NET Journal just published an article about concurrency:
"The MailboxProcessor in the F# standard library uses unbounded queues that can grow to be arbitrarily long. Consequently, the queue in a consumer can grow without limit if its producers overwhelm it with data. This article examines an elegant solution to this problem in the form of a bounded queue with asynchronous enqueue and dequeue operations that cause both producers and consumers to cooperatively suspend themselves until capacity or data are available to satisfy their request. Furthermore, this simpler architecture neatly decouples the sequential processes from the concurrent queues they use to communicate..."
To read this article and more, subscribe to The F#.NET Journal today!

Friday, 29 June 2012

Pathological garbage collector behaviour


The F#.NET Journal just published an article about garbage collection:
"When performance is important in a garbage collected language such as F# it is useful to have some understanding of the costs of operations that the garbage collector performs. This article provides an introduction to garbage collection, an overview of the processes performed by generational garbage collectors for multithreaded programs and goes on to examine some F# programs that deliberately exhibit pathological behaviour and presents workarounds that alleviate or even eliminate the costs associated with garbage collection on .NET..."


To read this article and more, subscribe to The F#.NET Journal today!

Thursday, 21 June 2012

Purely functional hash tries

The F#.NET Journal just published an article about purely functional data structures:
"Purely functional data structures such as the Set and Map collections provided by the F# standard library have many advantages over traditional imperative collections, particularly clarity due to the absence of mutation, but their performance can be much worse. Benchmarks have shown the F# Set and Map running up to 40× slower than the imperative HashSet and Dictionary collections provided by .NET. This article describes the design and implementation of a hash trie implementation of a set that is up to 7× faster to construct and up to 4.5× faster to search than the F# set..."
To read this article and more, subscribe to The F#.NET Journal today!

Monday, 18 June 2012

Taylor series expansion of cosine that is generic over number type


Taylor series expansion of cos(x) about x=0:
cosine(x) = 1 - x²/2! + x⁴/4! - x⁶/6! + ...
Let's start by writing a float-specific version in F#:
let cosine n x =
  let rec loop i q t c =
    if i=n then c else
      loop (i + 1) (q + 10 + 8*i) (-t * x * x / float q) (c + t)
  loop 0 2 1.0 0.0
For example, we can compute 1M terms of the expansion of x=0.1:
cosine 1000000 0.1
The best way to make this polymorphic in F# is to parameterize the function over the operators it uses and mark it as inline in order to remove the performance overhead of this parameterization:
let inline cosine zero one ofInt ( ~-. ) ( +. ) ( *. ) ( /. ) n x =
  let rec loop i q t c =
    if i=n then c else
      loop (i + 1) (q + 10 + 8*i) (-.t *. x *. x /. ofInt q) (c +. t)
  loop 0 2 one zero
Now we can compute 1M terms using float like this, which is just as fast as before:
cosine 0.0 1.0 float ( ~- ) (+) (*) (/) 1000000 0.1
But we can also do single-precision float:
cosine 0.0f 1.0f float32 ( ~- ) (+) (*) (/) 1000000 0.1f
And arbitrary-precision rational:
cosine 0N 1N BigNum.FromInt (~-) (+) (*) (/) 10 (1N / 10N)
And even symbolic:
type Expr =
  | Int of int
  | Var of string
  | Add of Expr * Expr
  | Mul of Expr * Expr
  | Pow of Expr * Expr

  static member (~-) f = Mul(Int -1, f)
  static member (+) (f, g) = Add(f, g)
  static member (*) (f, g) = Mul(f, g)
  static member (/) (f, g) = Mul(f, Pow(g, Int -1))

cosine (Int 0) (Int 1) Int (~-) (+) (*) (/) 3 (Var "x")
To make it faster, hoist the common subexpression -x*x out of loop.

Friday, 25 May 2012

Graph coloring in F#

Eric Lippert of the Visual C# compiler team at Microsoft wrote a fascinating series of five blog posts about graph coloring. The articles develop a C# program with 20 lines of code devoted to defining the problem (coloring the planar graph of borders between 13 South American countries) followed by another 200 lines of C# code to solve the graph coloring problem. In the fourth article, this culminates in the solution 0, 1, 2, 1, 2, 1, 0, 2, 0, 1, 2, 1, 3 that uses just three colors to solve the problem.
Let's solve the same problem using the F# programming language instead. We begin by defining the problem using a union type to represent the countries and nested lists to represent the adjacencies:
type Country =
  | Brazil | FrenchGuiana | Suriname | Guyana | Venezuala | Colombia
  | Ecuador | Peru | Chile | Bolivia | Paraguay | Uruguay | Argentina
 
let countries =
  [|Brazil; FrenchGuiana; Suriname; Guyana; Venezuala; Colombia;
    Ecuador; Peru; Chile; Bolivia; Paraguay; Uruguay; Argentina|]
 
let edges =
  [
    Brazil, [ FrenchGuiana; Suriname; Guyana; Venezuala; Colombia;
              Peru; Bolivia; Paraguay; Uruguay; Argentina ]
    FrenchGuiana, [ Brazil; Suriname ]
    Suriname, [ Brazil; FrenchGuiana; Guyana ]
    Guyana, [ Brazil; Suriname; Venezuala ]
    Venezuala, [ Brazil; Guyana; Colombia ]
    Colombia, [ Brazil; Venezuala; Peru; Ecuador ]
    Ecuador, [ Colombia; Peru ]
    Peru, [ Brazil; Colombia; Ecuador; Bolivia; Chile ]
    Chile, [ Peru; Bolivia; Argentina ]
    Bolivia, [ Chile; Peru; Brazil; Paraguay; Argentina ]
    Paraguay, [ Bolivia; Brazil; Argentina ]
    Argentina, [ Chile; Bolivia; Paraguay; Brazil; Uruguay ]
    Uruguay, [ Brazil; Argentina ] 
  ]
These are the vertices and edges of our graph.
The countries of South America are, of course, a planar graph so we know they can be assigned colors with no two adjacent countries having the same color using a total of no more than four different colors. Let us define the set of all four possible colors as follows:

let allColors = set[0..3]

The graph coloring problem may then be solved by adding our vertices and edges to an empty graph, generating a sequence of possible colorings at each step:

let solve (V, E)  =
  ((Set.empty, seq[Map.empty]), V)
  ||> Seq.fold (fun (vertices, solutions) vertex ->
    let neighbors =
      [|for src, dsts in E do
          if src = vertex then yield! Set.intersect vertices (set dsts) else
            for dst in dsts do
              if dst = vertex && Set.contains src vertices then
               yield src|]
    Set.add vertex vertices,
    seq { for color in solutions do
            for newColor in allColors do
              if Seq.forall (fun u -> color.[u] <> newColor) neighbors then
                yield Map.add vertex newColor color })

Eric's result may then be obtained by taking the first of the valid colorings and using it to enumerate the colors of each country in turn:

let answer =
  let _, colors = solve (countries, edges)
  let color = Seq.head colors
  [ for country in countries -> color.[country] ]

Remarkably, we have solved the same problem in just 19 lines of F# code! One can only imagine how much simpler Microsoft's Roslyn compiler could have been had it been written in F#...

Improving matrix-matrix multiply and prime number programs in F#

Several people have cited the comparison of Erlang, F#, Go and Java from here. Let us present some improved F# solutions.
The parallel matrix-matrix multiply makes heavy use of async for parallel programming, where it is an anti-pattern. So it may be productively rewritten using Tasks as follows:

let randomMatrix =
  let rnd = new System.Random()
  Array2D.init 500 500 (fun _ _ -> rnd.NextDouble())
 
let matrixMultiply (a:float[,]) (b:float[,]) =
  let rowsA, colsA = Array2D.length1 a, Array2D.length2 a
  let rowsB, colsB = Array2D.length1 b, Array2D.length2 b
  let result = Array2D.create rowsA colsB 0.0
  System.Threading.Tasks.Parallel.For(0, rowsA, fun i ->
    for j in 0 .. colsB - 1 do
      let mutable x = 0.0
      for k in 0 .. colsA - 1 do
        x <- x + a.[i,k] * b.[k,j]
      result.[i,j] <- x
  ) |> ignore
  result
matrixMultiply randomMatrix randomMatrix

This is 60x faster than the original async-based solution. However, this is a poor choice of algorithm because it lacks locality. Better algorithms are tiled or cache oblivious.
Their prime number generator makes heavy use of enumerable sequences which can be productively rewritten using recursion and arrays instead, as follows:

(* A very naive prime number detector *)
let isPrime (n:int) =
  let rec loop i = i*i > n || (n % i <> 0 && loop (i+1))
  loop 2
 
(* Return primes between m and n using multiple threads *)
let primes m n =
  Array.Parallel.init (n-m+1) (fun i -> m+i, isPrime (m+i))
  |> Array.Parallel.choose (function n, true -> Some n | _ -> None)
 
(* Run a test *)
while true do
  let timer = System.Diagnostics.Stopwatch.StartNew()
  primes 1000000000 1000100000
  |> ignore
  printfn "%fs" timer.Elapsed.TotalSeconds

This is 13x faster than their original seq-based solution.

Monday, 14 May 2012

Parallel programming in functional languages


Functional programming has immutable data structures and no side effect which are inherently suitable for parallel programming.
That is a common misconception. Parallelism is solely about performance and purity usually degrades performance. So purely functional programming is not a good starting point if your objective is performance. In general, purity means more allocation and worse locality. In particular, purely functional data structures replace arrays with trees and that incurs many more allocations and places far more burden on the garbage collector.
For example, measure the performance of the elegant purely functional "quicksort" in Haskell. Last time I checked, it was thousands of times slower than Sedgewick's conventional imperative solution on my machine.
Also, nobody has ever managed to implement an efficient dictionary data structure (pure or impure) or an efficient purely-functional sort in Haskell and nobody has figured out how to write an asymptotically efficient persistent disjoint set data structure and there is no known way to implement other basic data structures like purely functional weak dictionaries!
Moreover, the garbage collector in the defacto-standard Haskell implementation (GHC) is heavily optimized for pure code at the expense of mutation. For example, GHC's hash table is still 26× slower than .NET's. Historically, the performance of mutation was considered so unimportant in Haskell that writing a single pointer to an array was an O(n) operation in GHC for five years.
I investigate how to exploit multicore computation in a functional language, and target production code for some numerical applications.
The best way I have found is to learn how to write decent parallel programs for multicores in the imperative style (study Cilk, in particular) and then factor the code using first-class functions and tail call elimination into the impure functional style.
This means cache oblivious data structures and algorithms. Nobody has ever done this in Haskell. Indeed, none of the research published on parallel Haskell to-date has even mentioned the essential concept of cache complexity. Furthermore, although it is widely known that non-strict (aka lazy) evaluation renders space consumption unpredictable, it is not yet widely appreciated that this same problem renders scalability wildly unpredictable on multicores.
F# has Microsoft behind its back, and its parallel constructs such as PLINQ, TPL, Async Workflow have been well-documented and shown some potentials.
They are well beyond showing potential. Thousands of commercial applications have been built upon those industrial-strength foundations.
However, research about parallelism in Haskell is very active at the moment, and it posseses many nice features which haven't supported by F# yet:
You are assuming they are "nice features"? Read Simon Marlow's latest paper about this:
"...a combination of practical experience and investigation has lead us to conclude that this approach is not without drawbacks. In a nutshell, the problem is this: achieving parallelism with par requires that the programmer understand operational properties of the language that are at best implementation-defined (and at worst undefined). This makes par difficult to use, and pitfalls abound — new users have a high failure rate..."
My question is which language should I choose for functional parallelism?
I'd advise against parallel purely functional code for production because it is a complete wild card. Assuming you're happy to sacrifice some purity in order to attain competitive performance, I use and recommend F#.

Applications of priority queues

Priority queues are an abstract collection that allow random insertion of new elements and removal of the smallest or largest element. Here are a couple of applications of priority queues:

Heaps are a family of concrete data structures that can be used to implement priority queues efficiently. For implementations of heaps in F# see the article Data structures: Heaps in the F#.NET Journal.