Monday, 22 February 2010

The BookWorm challenge

Chris Smith and Brian McNamara of Microsoft have been squabbling like mildly-naughty henchmen in recent blog posts over F# solutions to the BookWorm challenge, a word search on a hexagonal board:

Despite their best efforts, their results make them look more like plucky comic relief than genuinely evil. We'll save you the bluster and focus on destroying their entire line of thinking. Suffice to say, our blog post should be read with an English accent.

We present a solution that is simultaneously less than half the length of their shortest solution and over eight times faster than their fastest solution on the largest example problem they have given to date. Moreover, our solution requires no search data structure and, consequently, uses far less memory. In fact, a simple alteration can even evade the need to store the entire dictionary, reducing the total memory consumption to only a few bytes for each word and the puzzle itself.

To date, Chris and Brian have developed in a single direction: using a trie data structure to perform asymptotically fast string searches. This sounds like a good idea but, in fact, the sole purpose of a trie is to be asymptotically efficient in the limit of infinite-length keys. Consequently, tries work wonders when your keys are giga-length strings such as DNA sequences but, as Chris noted, the keys for this problem are English words under 16 characters in length. As such small strings easily fit into a cache line, traversing them directly is very fast. In contrast, a tree-based search data structure that incurs an indirection at every character imposes a much higher constant prefactor in the time taken. Moreover, their optimizations have come at the expense of correctness because they lost the ability to handle board positions containing multiple characters such as Qu in their own example board pictured above.

Our program uses a simple string array array to represent the board such that positions may contain multiple characters and a string array to represent the dictionary. Each word is searched for in the board in turn and those found are printed and, finally, the total number of words found in the board is printed. The entire program is around 40 lines of code and the whole process, including loading the dictionary, takes only 0.9s on this 8-core machine. Here is the complete program:

let data = [|[|"O"; "U"; "B"; "N"; "O"; "V"; "S"|]; [|"J"; "A"; "P"; "A"; "L"; "O"; "M"|]; [|"G"; "T"; "E"; "J"; "T"; "S"; "H"|]; [|"U"; "A"; "E"; "O"; "F"; "S"; "A"|]; [|"N"; "O"; "A"; "E"; "L"; "E"; "P"|]; [|"O"; "S"; "O"; "U"; "G"; "N"; "P"|]; [|"A"; "T"; "F"; "G"; "A"; "I"; "I"|]; [|"W"; "U"; "B"; "N"; "O"; "V"; "S"|]; [|"J"; "E"; "P"; "A"; "L"; "O"; "M"|]; [|"G"; "S"; "E"; "J"; "T"; "Y"; "C"|]; [|"U"; "A"; "O"; "O"; "F"; "Y"; "R"|]; [|"N"; "O"; "M"; "E"; "L"; "U"; "T"|]; [|"O"; "S"; "O"; "E"; "G"; "E"; "I"|]; [|"H"; "T"; "F"; "N"; "A"; "I"; "Y"|]; [|"O"; "U"; "B"; "N"; "E"; "V"; "S"|]; [|"J"; "A"; "P"; "A"; "S"; "O"; "M"|]; [|"G"; "T"; "E"; "J"; "T"; "S"; "C"|]; [|"U"; "A"; "E"; "O"; "F"; "Y"; "R"|]; [|"N"; "O"; "A"; "E"; "L"; "U"; "T"|]; [|"O"; "S"; "O"; "U"; "G"; "E"; "I"|]; [|"H"; "T"; "F"; "G"; "A"; "I"; "Y"|]|] let rows, cols = data.Length, data.[0].Length let read file = System.IO.File.ReadAllLines(@"C:\Users\Jon\Documents\" + file) let words = let ignored = HashSet(read "IgnoredWords.txt", HashIdentity.Structural) [|for word in read "TWL06.txt" do if word.Length > 2 && not(ignored.Contains word) then yield word|] let neighbors = [|for y in 0..rows-1 -> [|for x in 0..cols-1 -> [|for dx, dy in [-1, -1; 0, -1; -1, 0; 1, 0; -1, 1; 0, 1] do let x, y = x + dx + (if y%2=1 && dy<>0 then 1 else 0), y + dy if x>=0 && x<cols && y>=0 && y<rows then yield x, y|]|]|] let rec eq (word: string) i (elt: string) j = j = elt.Length || i < word.Length && word.[i] = elt.[j] && eq word (i+1) elt (j+1) let rec search (visited: _ [] []) word i (x, y) = let elt = data.[y].[x] eq word i elt 0 && let i = i + elt.Length i = word.Length || (visited.[y].[x] <- true let pred(x, y) = not visited.[y].[x] && search visited word i (x, y) let result = Array.exists pred neighbors.[y].[x] visited.[y].[x] <- false result) let xys = [|for y in 0..rows-1 do for x in 0..cols-1 do yield x, y|] let contains (word: string) = let visited = Array.map (Array.map (fun _ -> false)) data if Array.exists (search visited word 0) xys then Some word else None do let words = Array.Parallel.choose contains words Array.iter (printfn "%s") words printfn "%d words" words.Length

Sunday, 21 February 2010

Sieve of Eratosthenes

The infinite sequence of prime numbers can be defined easily but inefficiently in F# using Haskell-style infinite lazy lists:

> #r "FSharp.PowerPack.dll";; --> Referenced 'C:\Program Files\FSharpPowerPack-1.9.9.9\bin\FSharp.PowerPack.dll' > open LazyList;; > let rec primes = unfold (fun n -> Some(n, n+1)) 3 |> filter isPrime |> cons 2 and isPrime n = Seq.takeWhile (fun d -> d*d <= n) primes |> Seq.forall (fun d -> n%d <> 0);; val isPrime : int -> bool val primes : LazyList<int> > #time;; --> Timing now on > Seq.nth 100000 primes;; Real: 00:00:05.251, CPU: 00:00:05.241, GC gen0: 283, gen1: 47, gen2: 1 val it : int = 1299721

Find the 100,000th prime using this algorithm took over 5s. The Sieve of Eratosthenes is a much more efficient algorithm and may be implemented in F# in only 18 lines of code as follows:

> let primes = let a = ResizeArray[2] let grow() = let p0 = a.[a.Count-1]+1 let b = Array.create p0 true for di in a do let rec loop i = if i<b.Length then b.[i] <- false loop(i+di) let i0 = p0/di*di loop(if i0<p0 then i0+di-p0 else i0-p0) for i=0 to b.Length-1 do if b.[i] then a.Add(p0+i) fun n -> while n >= a.Count do grow() a.[n];; val primes : (int -> int) > primes 100000;; Real: 00:00:00.025, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0 val it : int = 1299721 > primes 10000000;; Real: 00:00:15.855, CPU: 00:00:15.958, GC gen0: 2, gen1: 2, gen2: 2 val it : int = 179424691

Note that our implementation produces an arbitrarily-extensible sequence and, therefore, can be used to compute primes of any size (up to machine precision, of course). In contrast, most imperative implementations of the Sieve of Eratosthenes only compute prime numbers up to a preset limit.

Computing the 100,000th prime now takes only 0.025s and, in fact, we can compute the 10 millionth prime in only 16 seconds.

For comparison, Melissa O'Neill published a research paper about the Sieve of Eratosthenes in Haskell. Melissa's simplest solution is orders of magnitude slower than ours and her fastest solution is an order of magnitude longer.

F# for Visualization 0.4.0.3 released

A new version of our F# for Visualization library is now available to customers. This version works with the latest F# 1.9.9.9 in Visual Studio 2008 with the new F# PowerPack.

Note that our products do not support Microsoft's CTP, beta and RC releases but users of our library will, of course, get a supported upgrade to VS2010 RTM when it becomes available (22nd March 2010).

F# for Numerics 0.2.0.5 released

A new version of our F# for Numerics library is now available to customers. This version works with the latest F# 1.9.9.9 in Visual Studio 2008 with the new F# PowerPack.

Note that our products do not support Microsoft's CTP, beta and RC releases but users of our library will, of course, get a supported upgrade to VS2010 RTM when it becomes available (12th April 2010).

Friday, 19 February 2010

Channel 9 lectures by Don Syme: part 3

Microsoft's Channel 9 video blog has published the third lecture in a three-part series about the F# programming language by the creator of F# and .NET generics, Don Syme. This lecture covers parallel and asynchronous programming and units of measure.

This lecture series has already been very well received with over 78,000 views!

Channel 9 is Microsoft's free Learning Center for free technical training on emerging Microsoft products and technologies. This resource provides a set of courses with each course including a set of videos, hands-on labs, and source code samples to get you up-to-speed quickly.

Morris sequences

The Morris sequence, also known as the "see and say" or "look and say" sequence, is a sequence of sequences of integers where each subsequent sequence is a run-length encoding of the previous sequence. In other words, a run of identical integers in the previous sequence is encoded as two new elements in the next sequence: a number of repetitions and the repeated element itself.

Tony Lee posed the construction of an elegant F# program to generate the Morris sequence as an interesting challenge on Stack Overflow. Our proposed solution is as follows:

> let next (xs: ResizeArray<_>) = let ys = ResizeArray() let add n x = if n > 0 then ys.Add n ys.Add x let mutable n = 0 let mutable x = 0 for i=0 to xs.Count-1 do let x' = xs.[i] if x=x' then n <- n + 1 else add n x n <- 1 x <- x' add n x ys;; val next : ResizeArray<int> -> System.Collections.Generic.List<int> > let morris = Seq.unfold (fun xs -> Some((xs :> seq<int>), next xs)) (ResizeArray [1]);; val morris : seq<seq<int>> > morris;; val it : seq<seq<int>> = seq [seq [1]; seq [1; 1]; seq [2; 1]; seq [1; 2; 1; 1]; ...]

This solution is orders of magnitude faster than the other presented on Stack Overflow at the time of writing. The style wraps a low-level imperative solution for generating each subsequence in a high-level purely functional sequence. However, our solution is not as lazy as the original and, consequently, computes more than necessary when only the start of a subsequence is required.

Wednesday, 17 February 2010

F# development with Visual Studio 2010 and .NET 4

The F#.NET Journal just published an article about the latest version of F# and Visual Studio:

"Microsoft recently released Visual Studio 2010 Release Candidate (RC). The next release will be the official RTM so this is the perfect opportunity to see what improvements have been made to both Visual Studio, The .NET Framework and F# itself. Topics covered include recent changes to the F# language, F# integration in Visual Studio 2010, new functionality bundled in .NET 4 (including numerics and parallelism) and improvements in the CLR itself..."

To read this article and more, subscribe to The F#.NET Journal today!

Sunday, 14 February 2010

Anagrams

Another programming challenge is to compute anagrams from a dictionary and output the longest sets of anagrams first. We submitted the following elegant 3-line F# solution:

let words = System.IO.File.ReadAllLines @"unixdict.txt" [|for _, words in Seq.groupBy (Array.ofSeq >> Array.sort) words -> Array.ofSeq words|] |> Array.sortBy (fun s -> -Seq.length s)

This tiny program produces the following sets of anagrams in only half second:

val it : string [] [] = [|[|"alger"; "glare"; "lager"; "large"; "regal"|]; [|"caret"; "carte"; "cater"; "crate"; "trace"|]; [|"abel"; "able"; "bale"; "bela"; "elba"|]; [|"elan"; "lane"; "lean"; "lena"; "neal"|]; [|"angel"; "angle"; "galen"; "glean"; "lange"|]; [|"evil"; "levi"; "live"; "veil"; "vile"|]; [|"aden"; "dane"; "dean"; "edna"|]; [|"emit"; "item"; "mite"; "time"|]; [|"pare"; "pear"; "rape"; "reap"|]; [|"esprit"; "priest"; "sprite"; "stripe"|]; [|"hare"; "hear"; "hera"; "rhea"|]; [|"resin"; "rinse"; "risen"; "siren"|]; [|"keats"; "skate"; "stake"; "steak"|]; [|"lascar"; "rascal"; "sacral"; "scalar"|]; [|"amen"; "mane"; "mean"; "name"|]; [|"nepal"; "panel"; "penal"; "plane"|]; [|"lemon"; "melon"; "menlo"; "monel"|]; [|"least"; "slate"; "stale"; "steal"|]; [|"leap"; "pale"; "peal"; "plea"|]; [|"ames"; "mesa"; "same"; "seam"|]; [|"leapt"; "petal"; "plate"; "pleat"|]; [|"lien"; "line"; "neil"; "nile"|]; [|"abet"; "bate"; "beat"; "beta"|]; [|"mate"; "meat"; "tame"; "team"|]; [|"beard"; "bread"; "debar"; "debra"|]; [|"lament"; "mantel"; "mantle"; "mental"|]; [|"aires"; "aries"; "arise"; "raise"|]; [|"enol"; "leon"; "lone"; "noel"|]; [|"cereus"; "recuse"; "rescue"; "secure"|]; [|"manor"; "moran"; "norma"; "roman"|]; [|"latus"; "sault"; "talus"; "tulsa"|]; [|"diet"; "edit"; "tide"; "tied"|]; [|"lima"; "mail"; "mali"; "mila"|]; [|"are"; "ear"; "era"; "rae"|]; [|"apt"; "pat"; "pta"; "tap"|]; [|"dare"; "dear"; "erda"; "read"|]; [|"ate"; "eat"; "eta"; "tea"|]; [|"ant"; "nat"; "tan"|]; [|"yates"; "yeast"; "yeats"|]; [|"nerve"; "never"; "verne"|]; [|"ether"; "there"; "three"|]; [|"dave"; "vade"; "veda"|]; [|"earn"; "near"; "rena"|]; [|"brag"; "garb"; "grab"|]; [|"magneto"; "megaton"; "montage"|]; [|"earnest"; "eastern"; "nearest"|]; [|"den"; "end"; "ned"|]; [|"lair"; "liar"; "rail"|]; [|"alton"; "talon"; "tonal"|]; [|"now"; "own"; "won"|]; [|"earth"; "hater"; "heart"|]; [|"dire"; "reid"; "ride"|]; [|"bard"; "brad"; "drab"|]; [|"bare"; "bear"; "brae"|]; [|"earthen"; "hearten"; "teheran"|]; [|"kale"; "lake"; "leak"|]; [|"arnold"; "roland"; "ronald"|]; [|"cpu"; "cup"; "puc"|]; [|"earthy"; "hearty"; "thayer"|]; [|"abut"; "tabu"; "tuba"|]; [|"lame"; "male"; "meal"|]; [|"hank"; "kahn"; "khan"|]; [|"riot"; "tori"; "trio"|]; [|"dater"; "trade"; "tread"|]; [|"rite"; "tier"; "tire"|]; [|"army"; "mary"; "myra"|]; [|"alert"; "alter"; "later"|]; [|"acts"; "cast"; "scat"|]; [|"carven"; "cavern"; "craven"|]; [|"grate"; "great"; "greta"|]; [|"rosa"; "soar"; "sora"|]; [|"argot"; "gator"; "groat"|]; [|"demo"; "dome"; "mode"|]; [|"klein"; "kline"; "liken"|]; [|"lisa"; "sail"; "sial"|]; [|"cruel"; "lucre"; "ulcer"|]; [|"along"; "anglo"; "logan"|]; [|"listen"; "silent"; "tinsel"|]; [|"list"; "silt"; "slit"|]; [|"iran"; "nair"; "rain"|]; [|"baird"; "braid"; "rabid"|]; [|"dearth"; "hatred"; "thread"|]; [|"ape"; "epa"; "pea"|]; [|"eros"; "rose"; "sore"|]; [|"ares"; "sear"; "sera"|]; [|"acm"; "cam"; "mac"|]; [|"gas"; "gsa"; "sag"|]; [|"acme"; "came"; "mace"|]; [|"part"; "rapt"; "trap"|]; [|"argon"; "groan"; "organ"|]; [|"ester"; "steer"; "terse"|]; [|"acre"; "care"; "race"|]; [|"brain"; "brian"; "rabin"|]; [|"parse"; "spare"; "spear"|]; [|"bin"; "ibn"; "nib"|]; [|"result"; "rustle"; "ulster"|]; [|"armco"; "macro"; "marco"|]; [|"janos"; "jason"; "jonas"|]; [|"ante"; "nate"; "neat"|]; [|"goer"; "gore"; "ogre"|]; ...|]

This is one of the shortest and fastest of all the solutions on Rosetta Code. For example, the Haskell solution is more than twice as long at 8 lines of code.

Saturday, 13 February 2010

Channel 9 lectures by Don Syme: part 2

Microsoft's Channel 9 video blog has published the second lecture in a three-part series about the F# programming language by the creator of F# and .NET generics, Don Syme. This lecture covers functional data, pattern matching, basics of imperative programming and object oriented programming and sequences.

Channel 9 is Microsoft's free Learning Center for free technical training on emerging Microsoft products and technologies. This resource provides a set of courses with each course including a set of videos, hands-on labs, and source code samples to get you up-to-speed quickly.

Tuesday, 9 February 2010

F# for Visualization video tutorial

The following video walks through some of the basic functionality provided by our F# for Visualization library:

Sunday, 7 February 2010

Don Syme interviewed by Richard Morris

Richard Morris has published his Geek of the Week interview with Don Syme which is a favorite of ours not least because he quotes Don saying:

"If you look through the code samples in a book such as F# for Scientists they are breathtaking in their elegance, given what they achieve."

Thanks Don! Of course, if you liked our old book F# for Scientists you'll love our new book F# for Technical Computing.

Free e-book: The F# Survival Guide

John Puopolo and Sandy Squires teamed up to create a new e-book called The F# Survival Guide and have made it freely available on the web here. This e-book covers functional programming, numbers, strings, loops, tuples, generics, lists, sequences, record and union types, pattern matching, object oriented programming, exceptions, debugging, asynchronous programming and .NET interoperability.

Channel 9 F# lectures by Don Syme: part 1

Microsoft's Channel 9 video blog has published the first lecture in a three-part series about the F# programming language by the creator of F# and .NET generics, Don Syme. This lecture is an introduction to F# and includes a description of the language's ancestry and the motivation behind its creation as well as technical aspects of the language itself.

Channel 9 is Microsoft's free Learning Center for free technical training on emerging Microsoft products and technologies. This resource provides a set of courses with each course including a set of videos, hands-on labs, and source code samples to get you up-to-speed quickly.

Thursday, 4 February 2010

Tic-tac-toe demo

The F#.NET Journal just made another downloadable demo freely available. This demo is from the article Games programming: tic-tac-toe (31st December 2009).

The article walks through the design and implementation of a multithreaded program that uses logic programming to create an unbeatable computer opponent and Windows Presentation Foundation to provide a graphical user interface in only 115 lines of elegant F# code!

Wednesday, 3 February 2010

Force-based network visualization

The F#.NET Journal just published an article about dynamical simulation:

"Visualizing a graph or network of interconnected nodes is a fascinating and important challenge with a wide variety of practical applications. This article looks at a simple but effective force-based approach to the visualization of networks and uses Windows Presentation Foundation (WPF) to display the results in real-time as they emerge. The result is a compelling interactive animation of the layout algorithm as it progresses..."

To read this article and more, subscribe to The F#.NET Journal today!