Thursday, 30 December 2010

Performance-related features in F# and C#

Many people expect the performance of F# to be comparable to the performance of C# "because they both compile to the same intermediate language (CIL)". Although the VM obviously places the same ceiling on the performance of both F# and C# there are some features that affect performance differently between the two language implementations.

Firstly, C# and F# don't actually compile down to the same CIL because only F# generates tail calls. This affects performance when mutually recursive functions that represent the different states of a state machine tail call each other in order to move between states. This design pattern often appears in the context of asynchronous workflows that can tail call each other in F# but this will not even be possible with the new async support in C# 5.

Secondly, the implementation of delegates on .NET is currently quite inefficient and, consequently, F# uses its own FastFunc type for high-performance first-class functions.

Thirdly, F# uses .NET metadata to convey inline functions so that they can be exported across APIs and, of course, that can dramatically improve performance in certain circumstances. Moreover, F#'s inlining allows functions passed as arguments to higher-order functions to be completely inlined and type specialized. Our F# for Numerics library makes extensive use of this feature to ensure that per-type functions such as comparison are specialized, giving performance in F# up to 2,350× faster than the equivalent C# (!). In fact, some of our numerical routines are consistently several times faster than vendor-tuned Fortran in libraries like the Intel MKL.

Finally, pattern matching can be extremely laborious to express in C# because the language lacks pattern matching but it is almost impossible to maintain optimized C# code equivalent to many non-trivial pattern matches. In contrast, the F# compiler aggressively optimizes pattern matches during compilation.

Conversely, the C# compiler can be better at optimizing computations over value types (e.g. complex arithmetic).


Tuesday, 14 December 2010

Parsing mathematical expressions using active patterns

Active patterns allow arbitrary user-defined functions to be used to dissect values during pattern matching. This opens up some interesting possibilities with regard to parsing because it allows active patterns to be used to destructure streams of data when known patterns are encountered.

The following code sample creates several mutually-recursive active patterns that form a recursive descent parser for mathematical expressions:

> let rec (|Term|_|) = function
| Factor(e1, t) ->
let rec aux e1 = function
| '+'::Factor(e2, t) -> aux (e1 + e2) t
| '-'::Factor(e2, t) -> aux (e1 - e2) t
| t -> Some(e1, t)
aux e1 t
| _ -> None
and (|Factor|_|) = function
| '-'::Factor(e, t) -> Some(-e, t)
| Atom(e1, '*'::Factor(e2, t)) -> Some(e1 * e2, t)
| Atom(e, t) -> Some(e, t)
| _ -> None
and (|Atom|_|) = function
| c::t when '0'<=c && c<='9' -> Some(int(string c), t)
| '('::Term(e, ')'::t) -> Some(e, t)
| _ -> None;;
val ( |Term|_| ) : char list -> (int * char list) option
val ( |Factor|_| ) : char list -> (int * char list) option
val ( |Atom|_| ) : char list -> (int * char list) option

For example, the following parses the expression 1 + 2 × (3 - 4) × -5:

> let (Term e) = List.ofSeq "1+2*(3-4)*-5";;
val e : int * char list = (11, [])

The F# programming language is descended from the MetaLanguages (ML) family that were specifically bred for metaprogramming including parsing. For more information on metaprogramming with F#, read the following articles in The F#.NET Journal:

Tuesday, 30 November 2010

"A Taste of F#" video lecture by Don Syme

Without further ado, here is Don Syme's recent Tech-Ed lecture on F# today and in the future:










Get Microsoft Silverlight

"F# is like waterfowl..."

Winner of the British Computer Society’s Roger Needham award, Byron Cook, described why F# is his language of choice in a recent interview:

"F# is like waterfowl: it looks very graceful and calm from above, but below the water its legs are furiously working to protect you from the reality below..."


What's your preferred .NET programming language?

Zoho are running a poll about language preference on .NET:

The results show a surprisingly large proportion of developers now choosing F#, even more than VB!

Monday, 22 November 2010

Checking for pangrams

A pangram is a phrase that uses every letter of the alphabet. The following F# function can be used to check a string to see if it constitutes a pangram:

> let isPangram (str: string) = (set['a'..'z'] - set(str.ToLower())).IsEmpty;;
val isPangram : string -> bool

The function works by computing a set difference, removing the set of letters used in the string from the set of letters in the alphabet. If the resulting set is empty then the string must have used every letter in the alphabet and, therefore, it is a pangram.

Note how easy F# makes it to build sets from arbitrary sequences (lists and strings of characters, in this case) and perform set theoretic operations such as set difference (and union and intersection) on them. This basic functionality has a wide variety of applications ranging from the manipulation of strings seen here to the computation of nth-nearest neighbors in computational science (see the article Traversing networks: the nth-nearest neighbor in The F#.NET Journal).

For example, the following verifies a popular English pangram:

> isPangram "The quick brown fox jumped over the lazy dogs";;
val it : bool = true

Sunday, 21 November 2010

Computational geometry: quick hull

The F#.NET Journal just published an article about computational geometry:

"Computing the convex hull of a set of points is a challenge of fundamental importance in computational geometry. Other tasks such as Delaunay triangulation can be built upon a convex hull algorithm and then tasks such as Ruppert's algorithm may be built upon those. This article describes the design and implementation of the quickhull algorithm in just 10 lines of F# code before going on to build a WPF-based GUI application that can be used to manipulate a convex hull interactively..."

To read this article and more, subscribe to The F#.NET Journal today!

Sunday, 14 November 2010

Visualizing a complete graph

Kean Walmsley posted an article describing how AutoCAD can be used to visualize a complete graph using vector graphics. This prompted Cay Horstmann to publish an article describing how the SPDE framework makes it possible to visualize the same thing using only 12 lines of Scala code. Here's how you can do the same thing in F# out-of-the-box in only 10 lines of code:

> #r "PresentationCore.dll";; > #r "PresentationFramework.dll";; > #r "System.Xaml.dll";; > #r "WindowsBase.dll";; > open System.Windows;; > let n = 24;; val n : int = 24 > let p i = let t = float(i % n) / float n * 2.0 * System.Math.PI Point(1.0 + sin t, 1.0 + cos t);; val p : int -> Point > let shape = Shapes.Polyline(Stroke=Media.Brushes.Black, StrokeThickness=0.001);; val shape : Shapes.Polyline > for i=0 to n-1 do for j=i+1 to n do Seq.iter shape.Points.Add [p i; p j];; val it : unit = () > Window(Content=Controls.Viewbox(Child=shape)) |> Application().Run |> ignore;;

When run from an F# interactive session, this program produces the following output:

Being able to solve simple problems simply and scale it up to sophisticated commercial software is why I love F#!

Wednesday, 10 November 2010

Concurrent programming: TCP relay server

The F#.NET Journal just published an article about concurrent programming:

"Asynchronous workflows are an incredibly useful feature of the F# programming language in the context of concurrent programming. This article describes a simple concurrent program that provides a relay server that can be instructed to connect to a TCP server and forward messages to it..."

To read this article and more, subscribe to The F#.NET Journal today!

Saturday, 6 November 2010

F# reaching a wider audience

There are now dozens of F#-related articles at The Code Project covering a variety of interesting topics.

Some vendors have also started to publish literature describing how their products can be used from F#. For example, the Extreme Optimization high-performance numerical libraries for .NET now provide examples in F#.

Recent events like Microsoft's PDC 2010 including a lecture from Don Syme on The Future of F# and the recent release of the F# compiler and tool stack as open source software have massively boosted the visibility of F# which is great to see, not least because sales of Visual F# 2010 for Technical Computing just doubled!


349 people following Don Syme on Twitter

Don Syme now has 349 followers on Twitter. Doesn't sound like an amazing accomplishment until you learn that Don has never tweeted anything in his life! Apparently he is too busy flying around the world giving awe-inspiring presentations about the F# programming language he created.

According to Phil Trelford, Twitter is hot for F# right now so I couldn't help but get my own account and start tweeting!


Wednesday, 27 October 2010

F# video interviews with Don Syme (parts 1-4)

David Gristwood has posted a fascinating four-part series of video interviews with the creator of the F# programming language, Don Syme:

  1. An introduction to F#.

  2. An F# tutorial.

  3. F# and Windows Azure.

  4. F# in use at Microsoft Research.

In addition to our old F# for Scientists book that Don mentions in the interview, you may also appreciate our new Visual F# 2010 for Technical Computing book.

Tuesday, 19 October 2010

Graph algorithms: topological sort and all-pairs shortest paths

The F#.NET Journal just published an article about graph theory:

"Following on from our previous article about depth- and breadth-first searches of graphs, this article examines some more involved algorithms that also have many practical applications. A topological sort of a directed acyclic graph produces the vertices in a sequence where every vertex reachable from another appears after it in the sequence. All-pairs shortest paths finds the shortest path from any vertex to any other vertex..."

To read this article and more, subscribe to The F#.NET Journal today!

Memory leaks in asynchronous workflows

During the development of a concurrent distributed application for a client, we repeated a common mistake of tail recursing in an asynchronous workflow using the do! construct. For example:

let agent execute =
   new MailboxProcessor<_>(fun inbox ->
      let rec loop() = async {
         let! msg = inbox.Receive()
         execute msg
         do! loop()
         }
      loop())

This is problematic because using do! in this context leaks stack frames every time the workflows recurses (which is typically every time a message is processed). Moreover, F# uses a trampoline to implement this stack so the consequence is not a stack overflow but, rather, a gradual memory leak.

So if your concurrent applications appear to be leaking a few kilobytes for each message they process, search for instances of do! in tail position and replace them with return! like this:

let agent execute =
   new MailboxProcessor<_>(fun inbox ->
      let rec loop() = async {
         let! msg = inbox.Receive()
         execute msg
         return! loop()
         }
      loop())

Sunday, 3 October 2010

Representing and searching graphs

The F#.NET Journal just published an article about graph theory:

"Graph theory is an important branch of mathematics in the context of programming because important problems can often be phrased in terms of graphs. This articles takes a look at the representation of graphs in F# and introduces the basic operations of depth-first and breadth-first searches..."

To read this article and more, subscribe to The F#.NET Journal today!

Wednesday, 15 September 2010

Distributed message passing

The F#.NET Journal just published an article about message passing:

"The F# programming language makes message passing easy thanks to union types and pattern matching. Asynchronous workflows leverage this to provide Erlang-style agent-based programming. However, Erlang programs can be distributed across multiple machines transparently but asynchronous workflows work only in-process. This article examines the use of TCP sockets and serialization to communicate values between machines in order to use distributed message passing..."

To read this article and more, subscribe to The F#.NET Journal today!

F# lectures from the New England F# user group

The New England F# user group in the United States have put up a great list of video lectures and associated slides here.

Sunday, 12 September 2010

Simulating predator-prey relationships

The F#.NET Journal just published an article about biological simulation:

"This article describes the design and implementation of a Windows Presentation Foundation application that simulates and visualizes the evolution of the predator and prey populations over time. This involves the numerical integration of a differential equation and the resulting trajectory is visualized as a parametric plot using the new charting and graphing functionality provided with .NET 4. The parameters of the simulation can be adjusted by the user and the graph is updated in real time..."

To read this article and more, subscribe to The F#.NET Journal today!

Wednesday, 1 September 2010

Draft F# Component Design Guidelines

The F# team at Microsoft have kindly released a draft of their F# Component Design Guidelines. This document provides a wealth of advice for professional F# developers covering both F#-facing code and code exposed to other .NET languages.

Recommended reading for all F# programmers!

Sunday, 29 August 2010

Calculator

The F#.NET Journal just published an article about typeful programming:

"The pedagogical calculator sample demonstrates how a program can evaluate mathematical expressions originating from the user interface of a calculator. This article describes the design and implementation of a calculator application that uses Windows Presentation Foundation to provide a graphical user interface similar to that of a traditional calculator. In particular, every effort is made to leverage static checking in order to catch as many bugs at compile time as possible. This includes not only sophisticated union types to represent the state of the state machine but also exhaustiveness and redundancy checking of the advanced pattern matches that result as well as their factoring to reduce code size..."

To read this article and more, subscribe to The F#.NET Journal today!

Wednesday, 18 August 2010

Don Syme lecturing at the F#unctional Londoners Meetup

Creator of the F# programming language and all-around great guy Don Syme will be giving a lecture at the F#unctional Londoners Meetup, Skills Matter eXchange, on 9th September 2010.

Saturday, 7 August 2010

Jon Harrop lectures on Parallel QR Decomposition

The Skills Matter eXchange have kindly uploaded the video of my recent lecture as well as the accompanying slides.

The core of this lecture is an explanation of why purely functional code that looks embarrassingly parallel often does not scale and how scalability can be achieved through the use of mutation.

The example used is QR decomposition where pure code achieves less than 2× speedup on 8 cores but impure F# code achieves over 7× speedup and beats the reference implementation of BLAS and LAPACK with 99% less code. However, the lessons learned are not specific to this algorithm or even to numerical methods in general but apply to all software running on .NET.

For vastly more information on this subject, stay tuned for our forthcoming Multicore .NET book.

Wednesday, 4 August 2010

Data structures: heaps

The F#.NET Journal just published an article about data structures:

"A min-heap is a tree-based data structure that satisfies the constraint that the children at any given branch are always larger than their parent and, consequently, the minimum element is at the root. Heaps are most notable as an efficient way to implement priority queues which, in turn, underpin a variety of algorithms including some seen in previous F#.NET Journal articles. This article examines skew heaps, leftist heaps, pairing heaps and splay heaps..."

To read this article and more, subscribe to The F#.NET Journal today!

Tuesday, 27 July 2010

F#unctional Londoners meetup lecture (28th July 2010)

Zach Bray of Trayport and Jon Harrop of Flying Frog Consultancy Ltd. will be presenting lectures at Skills Matter eXchange London (UK) at 6:30pm on Wednesday 28th July 2010.

Many thanks to Carolyn Miller and Phil Trelford for organizing F#unctional Londoners Meetup Group, an excellent series of events!

Saturday, 17 July 2010

Histogram equalization

The F#.NET Journal just published an article about image processing and the new charting functionality in .NET 4:

"Over- and under-exposed images have their intensity distributions skewed to the high or low end of the range. Histogram equilization is one technique to combat this effect digitally by spreading out the distribution of intensities in an image via the intensity histogram. This article describes a program for image enhancement using histogram equalization with an interactive WPF-based GUI using the new charting functionality in .NET 4 to visualize the intensity histograms in real time..."

To read this article and more, subscribe to The F#.NET Journal today!

Monday, 5 July 2010

Happy numbers

Consider the sequence obtained by summing the squares of the digits of a number and then repeating with the result. If this sequence ends with the number one then the numbers in the sequence are Happy Numbers.

One of the challenges on Rosetta code is to compute the first eight Happy Numbers. The current F# solution weighs in at 26 lines of code. Let's see if we can do better...

We begin by writing a next function that computes the next happy number after the given number:

> let rec next (m: Set<_>) n i = if i=1 then n else if m.Contains i then next Set.empty (n+1) (n+1) else Seq.sumBy (fun d -> pown (int d - int '0') 2) (string i) |> next (m.Add i) n;; val next : Set<int> -> int -> int -> int

Note the use of the string function to convert an int into its string representation and then the use of the int function to convert each char back into an int.

Finally, we use the List.scan function to build a list of accumulated results as the next function finds each of the happy numbers in turn:

> List.scan (fun n _ -> next Set.empty (n+1) (n+1)) 1 [1..7];; val it : int list = [1; 7; 10; 13; 19; 23; 28; 31]

The scan function is relatively new and, consequently, not as widely appreciated as perhaps it should be.

Our whole program weighs in at only 6 lines of code, much shorter than the original!

Sunday, 4 July 2010

Lorenz attractor

The Lorenz attractor is a fractal derived from the trajectory of a 3-dimensional dynamical system that exhibits chaotic flow. This blog post describes a 35-line program that computes trajectories of this attractor and visualizes them as a beautiful whisp using Windows Presentation Foundation:

The program is as follows:

> #r "PresentationCore.dll";; > #r "PresentationFramework.dll";; > #r "WindowsBase.dll";; > #r "System.Xaml.dll";; > let trajectory f (x, y, z) dt n = let s, b, p = 10.0, 8.0 / 3.0, 28.0 let mutable x = x let mutable y = y let mutable z = z let a = Array.zeroCreate n for i=0 to n-1 do a.[i] <- f (x, y, z) x <- x + s * (y - x) * dt y <- y + (x * (p - z) - y) * dt z <- z + (x * y - b * z) * dt a;; val trajectory : (float * float * float -> 'a) -> float * float * float -> float -> int -> 'a [] > open System.Windows;; > let whisp() = let rand = System.Random() let f x = x + pown (rand.NextDouble() - 0.5) 2 let x, y, z = f 10.0, f 0.0, f 20.0 let xys = let f(x, y, z) = Point(20.0 + x, 25.0 + y - z + 50.0) trajectory f (x, y, z) 0.003 2000 let line_to (xy: Point) = (Media.LineSegment(xy, true) :> Media.PathSegment) Media.PathGeometry[ Media.PathFigure(xys.[0], Seq.map line_to xys, false) ];; val whisp : unit -> Media.PathGeometry > do let group = Media.GeometryGroup() for i=1 to 100 do whisp() |> group.Children.Add |> ignore let brush = Media.SolidColorBrush Media.Colors.Red let path = Shapes.Path(Data=group, Stroke=brush, StrokeThickness=0.005) let box = Controls.Viewbox(Child=path, Stretch=Media.Stretch.Uniform) let window = Window(Content=box, Title="Lorenz attractor") (Application()).Run window |> ignore;;

The trajectory function generates an array derived from the coordinates along a trajectory with the given starting position. The whisp function computes a trajectory with a slightly randomized starting position and generates a WPF PathGeometry. Finally, the main program generates 100 whisps and visualizes them.

Rigid body dynamics demo

The F#.NET Journal just made another downloadable demo freely available. This demo is from the article Rigid body dynamics (15th January 2010).

The article walks through the design and implementation of a 250-line program that simulates the dynamics of a collection of solid balls as they bound around a scene, using Windows Presentation Foundation for visualization!

Friday, 2 July 2010

Diff

The F#.NET Journal just published an article about text processing:

"The unix program diff identifies differences between text files, line by line. This tool is most useful for comparing two versions of a program. This article develops a simple implementation of the diff tool based upon the longest common subsequence (LCS) algorithm with a graphical user interface written using Windows Presentation Foundation to visualize the difference between two text files..."

To read this article and more, subscribe to The F#.NET Journal today!

Tuesday, 29 June 2010

Expert F# 2.0

Don Syme et al. have updated their seminal text on F#, bringing hardcore developers all the gory detail in the new Expert F# 2.0 book.

This is the second book to cover the latest version of F# that shipped with Visual Studio 2010 in April. The first book was, of course, our own Visual F# 2010 for Technical Computing.

Saturday, 19 June 2010

Parallelism in .NET 4 and Visual F# 2010

The F#.NET Journal just published an article about .NET 4:

"The latest version of the .NET Framework provides a wealth of functionality for parallel programming aimed at multicores. This includes not only the Task Parallel Library, that was previously available in the form of CTP releases, but also Parallel LINQ (PLINQ) and the new concurrent collections. The F# standard library has also been augmented with parallelized functions. This article examines each of these in turn..."

To read this article and more, subscribe to The F#.NET Journal today!

Sunday, 6 June 2010

A dragon curve in 17 lines of F#

Dragon curves are a family of self-similar fractal curves. The following 17-line F# program uses Windows Presentation Foundation to visualize the results of a simple recursively-constructed dragon curve:

open System.Windows open System.Windows.Media let m = Matrix(0.0, 0.5, -0.5, 0.0, 0.0, 0.0) let step segs = seq { for a: Point, b: Point in segs do let x = a + 0.5 * (b - a) + (b - a) * m yield! [a, x; b, x] } let rec nest n f x = if n=0 then x else nest (n-1) f (f x) [<System.STAThread>] do let ps = nest 14 step (seq [Point(0., 0.), Point(1., 0.)]) let d = Vector(Seq.min[for a, b in ps -> min a.X b.X], Seq.min[for a, b in ps -> min a.Y b.Y]) let lineTo p = (LineSegment(p, true) :> PathSegment) let path = Shapes.Path(Stroke=Brushes.Black, StrokeThickness=0.003) path.Data <- PathGeometry[for a, b in ps -> PathFigure(a-d, [lineTo(b-d)], false)] (Application()).Run(Window(Content=Controls.Viewbox(Child=path))) |> ignore

This program produces the following output:

Cache oblivious algorithms: Matrix multiply

The F#.NET Journal just published an article about cache oblivious algorithms:

"The widespread adoption of CPU caches around two decades ago forced a change in the way programmers traverse data structures when performance is of interest. The classic style was to iterate directly using for loops. Two new styles have emerged: either a cache aware set of loops that tile the dataset into fixed-size subsets that fit into a cache of a known size, or a cache oblivious style that uses divide and conquer to subdivide the problem until it fits into the cache regardless its size. This article describes the revolutionary idea of cache oblivious algorithms via the elegant and efficient implementation of a matrix multiply in F#. In particular, we demonstrate the importance of these techniques in the context of multicore programming...."

To read this article and more, subscribe to The F#.NET Journal today!

Tuesday, 25 May 2010

Mini hash table

Imagine you're stuck on a desert island with only a few keystrokes and you desperately need to create your own rudimentary hash table. What might you do?

Well, here's one possible solution: a function that takes a sequence of key-value pairs and builds a hash table, returning another function that searches the hash table for the given key in order to find its corresponding value:

> let hashtbl xs = let spine = [|for _ in xs -> ResizeArray()|] let f k = spine.[k.GetHashCode() % spine.Length] Seq.iter (fun (k, v) -> (f k).Add (k, v)) xs fun k -> Seq.pick (fun (k', v) -> if k=k' then Some v else None) (f k);; val hashtbl : seq<'a * 'b> -> ('a -> 'b) when 'a : equality

Here's how you might use it to look up some squares:

> let tbl = hashtbl [for x in 0..100 -> x, x * x];; val tbl : (int -> int) > tbl 4;; val it : int = 16 > tbl 12;; val it : int = 144

Thursday, 20 May 2010

Quicksort

The F#.NET Journal just published an article about sorting:

"The quicksort algorithm was invented by Tony Hoare in 1960 and remains one of the most celebrated algorithms and is still of great practical value. Implementing some of the many variations of the quicksort algorithm serves as an excellent introduction to mixed-paradigm programming in F# and the implementation of a production-quality implementation benefits enormously from the use of a substantial number of exotic features of the F# language. Moreover, the quicksort algorithm is amenable to parallelization, which is of increasing relevance in the multicore era, so the performance characteristics of parallelizations of the implementations are also of great interest..."

To read this article and more, subscribe to The F#.NET Journal today!

Friday, 14 May 2010

Java vs F#

Dr Cliff Click of Azul Systems, specialists in manycore JVM systems, recently published a blog post about the performance of Java compared primarily to C and C++ but also discussing C# and .NET. Three of Cliff's comments are of particular interest:

Under the heading "Places where C/C++ beats Java for obvious reasons":

"Value Types, such as a 'Complex' type require a full object in Java." - Dr Cliff Click

What Cliff forgot to mention is that .NET also provides value types and a far more compelling example than complex numbers is the humble hash table.

Consider the task of filling a hash table with 10,000,000 associations from integers to single-precision floating point numbers. This may be accomplished in Java as follows:

package hashtablebenchmark; import java.util.HashMap; public class Main { public static void main(String[] args) { int n = 10000000; for (int j=0; j<10; ++j) { long startTime = System.currentTimeMillis(); HashMap hashtable = new HashMap(n); for(int i=1; i<=n; ++i) { hashtable.put(i, 1.0f / i); } System.out.println("m[100] = " + hashtable.get(100)); long time = System.currentTimeMillis() - startTime; System.out.println("Took: " + time / 1e3 + "s"); } } }

The equivalent program in F# is not only shorter but also 17× faster:

let n = 10000000 let m = System.Collections.Generic.Dictionary(n) for i=1 to n do m.[i] <- 1.0f / float32 i printf "m[100] = %f\n" m.[100]

Specifically, Java takes 6.967s initially and 5.733s steady state whereas F# takes only 0.414s.

In fact, F# completes this benchmark so quickly that we would like to give it more work but Java cannot handle any more work without running out of memory on this 4Gb machine.

Elsewhere, Cliff also writes of Java:

"Very Good Multi-Threading Support. Parallel programming is just easier in Java." - Dr Cliff Click

and later:

"Not that I track C# all that closely but... I believe the JIT produces substantially slower code than Java" - Dr Cliff Click

Allow us to demonstrate the otherwise. The Computer Language Shootout contains a well-formed benchmark called spectral-norm for which the fastest Java solution is a 173-line parallel program. This implementation may be rewritten in only 24 lines of F#:

let A i j = 1.0 / float((i + j) * (i + j + 1) / 2 + i + 1) let inline mul A (u: _ []) (v: _ []) = System.Threading.Tasks.Parallel.For(0, v.Length, fun i -> let mutable vi = 0.0 for j = 0 to v.Length - 1 do vi <- vi + A i j * u.[j] v.[i] <- vi) |> ignore let AtAu u v = let w = Array.create (Array.length u) 0.0 mul (fun i j -> A i j) u w mul (fun i j -> A j i) w v do let n = 5500 let u, v = Array.create n 1.0, Array.create n 0.0 for i = 0 to 9 do AtAu u v AtAu v u let u, v = vector u, vector v printf "%0.9f\n" (sqrt(Vector.dot u v / Vector.dot v v))

In the Java, dozens of lines of code were devoted to parallelism. In contrast, only two lines of code were altered to parallelize the F#. So it would seem that parallel programming is not "just easier in Java".

The the serial Java takes 12.722s initially and 12.299s steady state whereas F# takes only 12.18s from a cold start. On this 8-core 2xE5405 2.0GHz Xeon, the parallel Java takes 1.839s initially and 1.820s steady state whereas the parallel F# takes only 1.60s from a cold start. The fact that Java is slower in every single case indicates that the CLR's JIT is not "producing substantially slower code than Java".

Finally, Cliff made no mention of two other design deficiencies that cripple Java's performance. Firstly, their type erasure approach to generics incurs massive performance degradation for a lot of generic code because it leads to unnecessary boxing. Secondly, the JVM's lack of tail call elimination is not only a growing hindrance in the era of functional programming but the only general-purpose workaround, trampolines, is typically 10× slower that necessary.

If you want to learn how to write fast parallel programs in F#, read our book Visual F# 2010 for Technical Computing.

Thursday, 6 May 2010

Parallelizing the SciMark2 benchmark: part 2

The F#.NET Journal just published an article about parallel programming:

"The SciMark2 benchmark is one of the better benchmarks for programming languages and their implementations in the context of technical computing. This article is the second in a two-part series revisiting the SciMark2 benchmark to examine the parallelization of each of its component tasks. Specifically, the Monte-Carlo, sparse matrix-vector multiplication and LU decomposition tests. The results are of general interest in the context of improving performance on multicores using parallel programming..."

To read this article and more, subscribe to The F#.NET Journal today!

Monday, 19 April 2010

Visual F# 2010 key binding cheat sheet

Download your copy of Microsoft's official F# key bindings cheat sheet here.

"by moving to F# we reduced code size by nearly 75 percent"

Grange Insurance of Columbus OH recently published a compelling case study regarding their use of Microsoft's new Visual F# 2010 programming language.

They cite clarity as a major advantage of F#:

"We liked that Visual F# is tailored to highly mathematical problems, helping programmers work more closely to the problem domain and enabling actuaries and other nonprogrammers to review the code as they would a math formula,"

and parallelism:

"We liked the .NET Parallel Extensions available through F# for managing parallelism automatically through the runtime system."

and integrated development environment and interoperability with existing technologies:

"We also liked that Visual F# is an integrated part of the Visual Studio tool set and the .NET Framework. That meant our programmers could use the same IDE and powerful .NET libraries they knew from having developed the earlier version of the solution in Visual C#."

For other F# success stories, read the freely available first article from The F#.NET Journal.

Visual F# 2010 for Technical Computing

Following the ground-breaking product release of Visual F# by Microsoft as part of Visual Studio 2010, we have updated our F# for Technical Computing book to cover both F# 2.0 and .NET 4.

Visual F# 2010 for Technical Computing includes a variety of changes throughout the book, some affecting changes to the language and others affecting the use of libraries. The three most prominent changes are:

  • Chapter 10: Concurrent Programming now covers the final production-quality implementation of asynchronous workflows in the F# 2.0 language.

  • Chapter 11: Parallel Programming now replaces the TPL CTP with the constructs for shared-memory parallelism that have been integrated into .NET 4. Both the API and the performance characteristics have changed considerably.

  • Chapter 12: Performance has been completely revamped due to massive improvements in the F# compiler and the new garbage collector in .NET 4.

One particular surprise was the substantial difference in performance characteristics between Visual Studio 2010 and the F# prereleases. Several of the benchmarks covered in the book show 400% changes!

kick it on DotNetKicks.com

Thursday, 15 April 2010

The A* algorithm

The F#.NET Journal just published an article about route finding:

"The A* algorithm is often used for route finding in game AI and is a generalization of Dijkstra's shortest-path algorithm from graph theory. This article describes two generic implementations of the A* algorithm written in F#. The first implementation is a simple prototype written in a purely functional style using data structures provided by F#. The second implementation uses mutable .NET collections to perform the same computation substantially more quickly. Finally, an example application is provided that finds a route across the Grand Canyon..."

To read this article and more, subscribe to The F#.NET Journal today!

Wednesday, 14 April 2010

Visual Studio 2010 with F# out now!

MSDN subscribers can now download Visual Studio 2010 that provides the first fully-supported product release of the new F# programming language. Don Syme, the creator of F# at Microsoft Research Cambridge, posted an emotional blog article accompanying this release as it represents the culmination of 7 years of work.

We shall be updating all of our F#-related products to work with the new product release and, in particular, our F# for Numerics and F# for Visualization software libraries will now see their first product releases as well. Future articles in the F#.NET Journal will cover every aspect of Visual Studio 2010 in detail.

Note that there is no Visual F# Express edition and, at the time of writing, there is no user-friendly way to develop F# on .NET 4 for free. However, you can develop F# 2.0 on .NET 3 using Visual Studio 2008 or compile F# 2.0 for .NET 4 using the command-line compilers. Brian McNamara of the F# team has encouraged anyone wanting Visual F# Express to upvote this motion. Hopefully we will see another Visual Studio 2010 Shell with separate F# release that makes it easy to develop F# software with the latest libraries for free. The F# team are also committed to open source cross-platform development, to the extent that they are hiring a new contractor specifically to help with this.

Sunday, 11 April 2010

Book review: F# for Scientists

Our 2006 book F# for Scientists has received its ninth review on Amazon and, like all previous reviews, was given the maximum 5-star rating!

Andre M. Van Meulebrouck from California writes:

A hallmark of this book is conciseness. (The book itself is fairly small and thin; and nicely hardbound.)

This book is a gold mine of great information that could take years to fully digest!

While the book is titled as a scientific book, and it is that; it also has much more to offer. It should be of great interest to scientists, mathematicians, statisticians, computer scientists, financial programmers, and any programmers who want to write good code. It features a well balanced selection of topics including: algorithms, data structures, visualization, graphics, threading, performance, and optimization. The use of DirectX is demonstrated. Some compilation techniques are also shown.

A nice selection of recursive list algorithms are presented that showcase the kind of problem solving that can be done purely with recursion and list processing. These are classic idioms that are good to be exposed to; like power set, and substitution with replacement.

Many of the examples are very much in the spirit of the Scheme Revised Reports, wherein the most gutted possible examples are used to demonstrate a given primitive or concept. Nothing extraneous to cause distractions.

There is a complement to this book called "F# for Technical Computing" that can be purchased from Flying Frog Consultancy. The complementary book adds nicely to the material in "F# for Scientists"; with discussions on such topics as parallel computing and WPF. In addition, the complementary book features longer page sizes, a stay flat (music book style) binding, and color; all of which I really like. (I wish more technical books made use of color because code is much easier to read when you see comments in one color, keywords in another, etc..)

Both books are gems. There are also counterparts to these books for OCaml programmers.

Relevant software can also be obtained from the Flying Frog Consultancy (which has, as part of its logo: "Putting the fun in functional since 2005").

Saturday, 3 April 2010

An e-mail client in F#

The F#.NET Journal just published an article about the design and implementation of a complete GUI application:

"An e-mail client is an application that checks a remote mailbox for incoming messages and allows the user to send newly composed messages and replies to received messages. Incoming e-mails are typically read using the Post Office Protocol version 3 (POP3) which is a simple plain-text protocol implemented over TCP sockets. Outgoing e-mails are sent using the Simple Message Transfer Protocol (SMTP), an implementation of which is provided by the .NET framework. This article describes the design and implementation of an e-mail client that can be used for basic e-mail handling but, in particular, is easily programmed to perform tasks such as transaction processing automatically..."

To read this article and more, subscribe to The F#.NET Journal today!

Thursday, 18 March 2010

.NET 4's improved tail call elimination

Existing versions of the .NET Framework sometimes unnecessarily fail to eliminate tail calls. Grant Richins of Microsoft recently published a blog post explaining how this has been addressed in .NET 4. The improvements make it entirely feasible to write idiomatic pure and impure functional F# programs relying upon tail call elimination without having to worry about stack overflows.

Such changes in the very fabric of .NET also demonstrate just how committed Microsoft are to their new F# programming language.

New VS2010 release date

Microsoft have updated the product page for Visual Studio 2010 (which includes F#!) with a new release date of 12th April 2010 under the "Buy" section.

24 days and counting!!!

Tuesday, 16 March 2010

Writing an article editor with Windows Presentation Foundation

The F#.NET Journal just published an article about the design and implementation of a complete GUI application:

"This article describes the article editing tool we are creating to help automate the process of generating journal content. The editor provides a simple text editor where the author writes a form of markdown. The markdown is interactively parsed and pretty printed in a separate pane and the editor allows the result to be exported to HTML for inclusion in the journal. The application is based entirely upon Windows Presentation Foundation..."

To read this article and more, subscribe to The F#.NET Journal today!

Wednesday, 10 March 2010

F# vs Unmanaged C++ for parallel numerics

We obtained a surprising performance result when comparing optimized parallel ray tracers written in F# and C++ recently. The following two programs render the same highly complex scenes containing over a million objects. Surprisingly, the 136-line managed F# program runs slightly faster at 17s than the 168-line unmanaged C++ which takes 18s.

Here is the F# program:

let t = System.Diagnostics.Stopwatch.StartNew() let delta = sqrt epsilon_float [<Struct>] type vec = val x : float val y : float val z : float new(x, y, z) = {x=x; y=y; z=z} static member ( + ) (a : vec, b : vec) = new vec(a.x+b.x, a.y+b.y, a.z+b.z) static member ( - ) (a : vec, b : vec) = new vec(a.x-b.x, a.y-b.y, a.z-b.z) static member ( * ) (a, b : vec) = new vec(a*b.x, a*b.y, a*b.z) let vec x y z = new vec(x, y, z) let dot (a : vec) (b : vec) = a.x * b.x + a.y * b.y + a.z * b.z let length r = sqrt(dot r r) let unitise r = 1. / length r * r type scene = Scene of vec * float * scene_type and scene_type = | Sphere | Group of scene * scene * scene * scene * scene let infinity = System.Double.PositiveInfinity let inline ray_sphere (d : vec) (v : vec) r = let vx = v.x let vy = v.y let vz = v.z let disc = vx * vx + vy * vy + vz * vz - r * r if disc < 0. then infinity else let b = vx * d.x + vy * d.y + vz * d.z let b2 = b * b if b2 < disc then infinity else let disc = sqrt(b2 - disc) let t1 = b - disc in if t1 > 0. then t1 else b + disc let ray_sphere' (o : vec) (d : vec) (c : vec) r = let vx = c.x - o.x let vy = c.y - o.y let vz = c.z - o.z let vv = vx * vx + vy * vy + vz * vz let b = vx * d.x + vy * d.y + vz * d.z let disc = b * b - vv + r * r disc >= 0. && b + sqrt disc >= 0. type hit = {mutable l: float; mutable nx: float; mutable ny: float; mutable nz: float} let intersect_sphere (dir: vec) hit l' (center: vec) = let x = l' * dir.x - center.x let y = l' * dir.y - center.y let z = l' * dir.z - center.z let il = 1. / sqrt(x * x + y * y + z * z) hit.l <- l' hit.nx <- il * x hit.ny <- il * y hit.nz <- il * z let rec intersect dir hit = function | Scene(center, radius,typ) -> let l' = ray_sphere dir center radius if l' < hit.l then match typ with | Sphere -> intersect_sphere dir hit l' center | Group(a, b, c, d, e) -> intersect dir hit a intersect dir hit b intersect dir hit c intersect dir hit d intersect dir hit e let rec intersect' orig dir = function | Scene(center, radius, typ) -> ray_sphere' orig dir center radius && match typ with | Sphere -> true | Group (a, b, c, d, e) -> intersect' orig dir a || intersect' orig dir b || intersect' orig dir c || intersect' orig dir d || intersect' orig dir e let neg_light = unitise(vec 1. 3. -2.) let rec ray_trace dir scene = let hit = {l=infinity; nx=0.; ny=0.; nz=0.} intersect dir hit scene; if hit.l = infinity then 0. else let n = vec hit.nx hit.ny hit.nz in let g = dot n neg_light in if g < 0. then 0. else if intersect' (hit.l * dir + delta * n) neg_light scene then 0. else g let fold5 f x a b c d e = f (f (f (f (f x a) b) c) d) e let rec create level c r = let obj = Scene(c, r,Sphere) if level = 1 then obj else let a = 3. * r / sqrt 12. let rec bound (c, r) = function | Scene(c', r',Sphere) -> c, max r (length (c - c') + r') | Scene(_, _, Group(v, w, x, y, z)) -> fold5 bound (c, r) v w x y z let aux x' z' = create (level - 1) (c + vec x' a z') (0.5 * r) let w, x, y, z = aux a (-a), aux (-a) (-a), aux a a, aux (-a) a let c, r = fold5 bound (c + vec 0. r 0., 0.) obj w x y z in Scene(c, r, Group(obj, w, x, y, z)) let level, n = 11, 2048 let scene = create level (vec 0. -1. 4.) 1. let ss = 4 do use ch = System.IO.File.Create("C:\Users\Jon\Documents\Visual Studio 10\Projects\F#\RayTracer\image.pgm", n * n + 32) sprintf "P5\n%d %d\n255\n" n n |> String.iter (ch.WriteByte << byte) let image = Array.create (n*n) 0uy System.Threading.Tasks.Parallel.For(0, n, fun y -> for x = 0 to n - 1 do let mutable g = 0. for dx = 0 to ss - 1 do for dy = 0 to ss - 1 do let aux x d = float x - 0.5 * float n + float d / float ss let dir = unitise(vec (aux x dx) (aux y dy) (float n)) g <- g + ray_trace dir scene image.[x+y*n] <- 0.5 + 255. * g / float (ss*ss) |> byte) |> ignore ch.Write(image, 0, n*n) printf "Took %gs" t.Elapsed.TotalSeconds

And here is the C++ program:

#include "stdafx.h" #include <vector> #include <iostream> #include <fstream> #include <limits> #include <cmath> using namespace std; numeric_limits<double> real; double delta = sqrt(real.epsilon()), infinity = real.infinity(); struct Vec { double x, y, z; Vec(double x2, double y2, double z2) : x(x2), y(y2), z(z2) {} }; Vec operator+(const Vec &a, const Vec &b) { return Vec(a.x+b.x, a.y+b.y, a.z+b.z); } Vec operator-(const Vec &a, const Vec &b) { return Vec(a.x-b.x, a.y-b.y, a.z-b.z); } Vec operator*(double a, const Vec &b) { return Vec(a*b.x, a*b.y, a*b.z); } double dot(const Vec &a, const Vec &b) { return a.x*b.x + a.y*b.y + a.z*b.z; } double length(const Vec &a) { return sqrt(dot(a, a)); } Vec unitise(const Vec &a) { return (1 / sqrt(dot(a, a))) * a; } struct Hit { double lambda; Vec normal; Hit() : lambda(infinity), normal(Vec(0, 0, 0)) {} }; struct Sphere; struct Scene { virtual ~Scene() {}; virtual void intersect(Hit &, const Vec &) const = 0; virtual bool intersect(const Vec &, const Vec &) const = 0; virtual Sphere bound(Sphere b) const = 0; }; struct Sphere : public Scene { Vec center; double radius; Sphere(Vec c, double r) : center(c), radius(r) {} ~Sphere() {} double ray_sphere(const Vec &dir) const { double b = center.x*dir.x + center.y*dir.y + center.z*dir.z; double disc = b*b - dot(center, center) + radius * radius; if (disc > 0) { double d = sqrt(disc), t2 = b + d; if (t2 > 0) { double t1 = b - d; return (t1 > 0 ? t1 : t2); } } return infinity; } bool sray_sphere(const Vec &orig, const Vec &dir) const { Vec v = center - orig; double b = dot(v, dir), disc = b*b - dot(v, v) + radius * radius; return (disc < 0 ? false : b + sqrt(disc) >= 0); } void intersect(Hit &hit, const Vec &dir) const { double lambda = ray_sphere(dir); if (lambda < hit.lambda) { hit.lambda = lambda; double nx = lambda*dir.x - center.x, ny = lambda*dir.y - center.y, nz = lambda*dir.z - center.z; double il = 1/sqrt(nx*nx + ny*ny + nz*nz); hit.normal.x = il*nx; hit.normal.y = il*ny; hit.normal.z = il*nz; } } bool intersect(const Vec &orig, const Vec &dir) const { return sray_sphere(orig, dir); } Sphere bound(Sphere b) const { return Sphere(b.center, max(b.radius, length(center - b.center) + radius)); } }; typedef vector<Scene *> Scenes; struct Group : public Scene { Sphere b; Scenes child; Group(Sphere b, Scenes c) : b(b), child(c) {} ~Group() { for (Scenes::const_iterator it=child.begin(); it!=child.end(); ++it) delete *it; } void intersect(Hit &hit, const Vec &dir) const { if (b.ray_sphere(dir) < hit.lambda) for (Scenes::const_iterator it=child.begin(); it!=child.end(); ++it) (*it)->intersect(hit, dir); } bool intersect(const Vec &orig, const Vec &dir) const { if (!b.sray_sphere(orig, dir)) return false; for (Scenes::const_iterator it=child.begin(); it!=child.end(); ++it) if ((*it)->intersect(orig, dir)) return true; return false; } Sphere bound(Sphere b) const { Sphere b2 = b; for (Scenes::const_iterator it=child.begin(); it!=child.end(); ++it) b2 = (*it)->bound(b2); return b2; } }; double ray_trace(const Vec &neg_light, const Vec &dir, const Scene &s) { Hit hit; s.intersect(hit, dir); if (hit.lambda == infinity) return 0; double g = dot(hit.normal, neg_light); if (g < 0) return 0.; Vec p = hit.lambda*dir + delta*hit.normal; return (s.intersect(p, neg_light) ? 0 : g); } Scene *create(int level, Vec c, double r) { Scene *s = new Sphere(c, r); if (level == 1) return s; Scenes child; child.reserve(5); child.push_back(s); double rn = 3*r/sqrt(12.); for (int dz=-1; dz<=1; dz+=2) for (int dx=-1; dx<=1; dx+=2) child.push_back(create(level-1, c + rn*Vec(dx, 1, dz), r/2)); Sphere b2(c + Vec(0, r, 0), r); for (Scenes::const_iterator it=child.begin(); it!=child.end(); ++it) b2 = (*it)->bound(b2); return new Group(b2, child); } int _tmain(int argc, char* argv[]) { int level = (argc==3 ? atoi(argv[1]) : 11), n = (argc==3 ? atoi(argv[2]) : 2048), ss = 4; Vec neg_light = unitise(Vec(1, 3, -2)); Scene *s(create(level, Vec(0, -1, 4), 1)); ofstream out; std::vector<char> image(n*n); #pragma omp parallel for for (int y=0; y<n; ++y) for (int x=0; x<n; ++x) { double g=0; for (int dx=0; dx<ss; ++dx) for (int dy=0; dy<ss; ++dy) { Vec dir(unitise(Vec(x+dx*1./ss-n/2., y+dy*1./ss-n/2., n))); g += ray_trace(neg_light, dir, *s); } image[x+(n-1-y)*n] = .5 + 255. * g / (ss*ss); } out.open("C:\\Users\\Jon\\Documents\\image.pgm", std::ios_base::binary); out << "P5\n" << n << " " << n << "\n255\n"; out.write(&image[0], n*n); return 0; }