Sunday, 26 February 2012

Bloom filter

A Bloom filter is a probabilistic alternative to a set. Membership in a bloom filter can be tested extremely quickly but only probabilistically, returning two values representing probable membership and definite non-membership.

The following class implements a purely functional Bloom filter in F#:
type BloomSet<'a when 'a: equality>(xs: seq<'a>) =
    let bits = System.Collections.BitArray(65536)
    let idx x n = (x >>> n) &&& 0xffff
      for x in xs do
        let x = x.GetHashCode()
        bits.[idx x 0] <- true
        bits.[idx x 16] <- true
    member __.ProbablyContains(x: 'a) =
      let x = x.GetHashCode()
      bits.[idx x 0] && bits.[idx x 16];;
type BloomSet<'a when 'a : equality> =
    new : xs:seq<'a> -> BloomSet<'a>
    member ProbablyContains : x:'a -> bool
The constructor creates a Bloom filter from the given sequence of elements. The ProbablyContains function tests for probable membership.

For example, the following demonstrates how the proportion of false positives rises with the number of elements in the Bloom filter for our fixed size (64kb) internal array:
for n in [1000; 10000; 100000] do
    let rand = System.Random()
    let xs = set [for _ in 1..n -> rand.NextDouble()]
    let s = BloomSet xs
    let ys = set [0.0 .. 0.001 .. 1.0] - xs
    let count x = if s.ProbablyContains x then 1 else 0
    let pFail = float (Seq.sumBy count ys) / float ys.Count
    printfn "%0.2f%% false positives when n=%d" (100.0 * pFail) n;;
0.10% false positives
7.89% false positives when n=10000
82.02% false positives when n=100000
Just like sets, Bloom filters have many practical applications.

1 comment:

Aggelos Biboudis said...

And by coincidence, some hours ago I was working on a Reactive Extensions version.
Excerpt at