Friday, 4 May 2012

Select elements that appear more than once

Here is a purely functional solution:
let duplicates xs =
  (Map.empty, xs)
  ||> Seq.scan (fun xs x ->
      match Map.tryFind x xs with
      | None -> Map.add x false xs
      | Some false -> Map.add x true xs
      | Some true -> xs)
  |> xs
  |> Seq.choose (fun (x, xs) ->
      match Map.tryFind x xs with
      | Some false -> Some x
      | None | Some true -> None)
This uses a map to track whether each element has been seen before once or many times and then emits the element the first time it is duplicated.
Here is a faster imperative version:
let duplicates (xs: _ seq) =
  seq { let d = System.Collections.Generic.Dictionary(HashIdentity.Structural)
        let e = xs.GetEnumerator()
        while e.MoveNext() do
          let x = e.Current
          let mutable seen = false
          if d.TryGetValue(x, &seen) then
            if not seen then
              d.[x] <- true
              yield x
            d.[x] <- false }
Using a for x in xs do loop to enumerate the elements in a sequence is substantially slower than using GetEnumerator directly but generating your own Enumerator is not significantly faster than using a computation expression with yield.
Note that the TryGetValue member of Dictionary avoids allocation in the inner loop by mutating a stack allocated value whereas the TryGetValue extension member offered by F# allocates its return tuple.

