Tuesday, 1 October 2013

Human-readable string joining in F#

Eric Lippert (formerly of Microsoft) posted an interesting blog post challenging readers to write a program that converts a sequence of strings into a single string by comma separating all but the last string, which is separated by the word “and” instead. Let’s consider this challenge in F#...

Perhaps the simplest and most maintainable solution is to convert the sequence into an array and act upon that instead:

let concat (xs: seq<string>) =
  let xs = Array.ofSeq xs
  match xs.Length with
  | 0 -> "{}"
  | 1 -> sprintf "{%s}" xs.[0]
  | n -> sprintf "{%s and %s}" (String.concat ", " xs.[..n-2]) xs.[n-1]

Pattern matching over the length of the array allows us to handle the three separate cases elegantly. Furthermore, in the final case it allows us to bind the length of the array to the variable n. In the last two cases we use the C-like sprintf function to compose several strings in one go. The final case also uses an array slice to obtain all elements except the last.

This function is easily tested in F# Interactive as follows:

> List.map concat [[]; ["A"]; ["A"; "B"]; ["A"; "B"; "C"]];;
val it : string list = ["{}"; "{A}"; "{A and B}"; "{A, B and C}"]

Let’s look at some alternative solutions. First up, an elegant (almost) purely functional solution based upon a state machine with three states:

type State =
  | Empty
  | Singleton of string
  | And of System.Text.StringBuilder * string

The following function transitions the state machine from one state to another as new characters are encountered:

let transition a x =
  match a with
  | Empty -> Singleton x
  | Singleton y -> And(System.Text.StringBuilder("{" + y), x)
  | And(sb, y) -> And(sb.Append(", " + y), x)

The main concat function then folds the transition function over the sequence of strings and acts upon the final accumulator:

let concat xs =
  match Seq.fold transition Empty xs with
  | Empty -> "{}"
  | Singleton x -> "{" + x + "}"
  | And(sb, y) -> sb.Append(" and " + y + "}").ToString()

The three states are equivalent to the three pattern matches in our simple solution. Technically, the .NET string builder is not a purely functional data structure but that actually has no effect on our code here.

Another, more traditional, solution is to encode the state machine as an imperative program using conditions and loops. This may be written as follows in F#:

let concat (xs: seq<string>) =
  use e = xs.GetEnumerator()
  if e.MoveNext() then
    let s = e.Current
    if e.MoveNext() then
      let sb = System.Text.StringBuilder("{" + s)
      let mutable s = e.Current
      while e.MoveNext() do
        sb.Append(", " + s) |> ignore
        s <- e.Current
      sb.Append(" and " + s + "}").ToString()
    else "{" + s + "}"
  else "{}"

This produces the same result as the two previous solutions but is the most difficult to understand because the state is implicitly represented by the “program counter”.

Finally, a rather contrived solution that converts the input sequence into an array but creates the output string using the F# String.init function:

let concat (xs: seq<string>) =
  let xs = Array.ofSeq xs
  let n = xs.Length
  let len = if n<2 then 2+n else 1 + 2*n
  String.init len (fun i ->
    match i, len-1-i with
    | 0, _ -> "{"
    | _, 0 -> "}"
    | i, _ when i%2=1 -> xs.[i/2]
    | _, 2 -> " and "
    | _ -> ", ")

The last point of interest is the performance of these different functions when applied to a very long input string. As none of these functions uses repeated concatenation they are all linear time. When given ten million strings on this machine the simplest solution takes 0.79s, the functional state machine takes 0.81s, the imperative state machine takes 54s and the solution using String.init takes just 0.38s. Given such a small performance difference, the winning solution in the vast majority of cases is the simplest solution.

To see more F# samples and learn more tips and tricks, subscribe to the F# Journal today!

No comments: