## Sunday, 26 February 2012

### Bloom filter

A Bloom filter is a probabilistic alternative to a set. Membership in a bloom filter can be tested extremely quickly but only probabilistically, returning two values representing probable membership and definite non-membership.

The following class implements a purely functional Bloom filter in F#:
```> type BloomSet<'a when 'a: equality>(xs: seq<'a>) =
let bits = System.Collections.BitArray(65536)

let idx x n = (x >>> n) &&& 0xffff

do
for x in xs do
let x = x.GetHashCode()
bits.[idx x 0] <- true
bits.[idx x 16] <- true

member __.ProbablyContains(x: 'a) =
let x = x.GetHashCode()
bits.[idx x 0] && bits.[idx x 16];;
type BloomSet<'a when 'a : equality> =
class
new : xs:seq<'a> -> BloomSet<'a>
member ProbablyContains : x:'a -> bool
end

```
The constructor creates a Bloom filter from the given sequence of elements. The ProbablyContains function tests for probable membership.

For example, the following demonstrates how the proportion of false positives rises with the number of elements in the Bloom filter for our fixed size (64kb) internal array:
for n in [1000; 10000; 100000] do
```    let rand = System.Random()
let xs = set [for _ in 1..n -> rand.NextDouble()]
let s = BloomSet xs
let ys = set [0.0 .. 0.001 .. 1.0] - xs
let count x = if s.ProbablyContains x then 1 else 0
let pFail = float (Seq.sumBy count ys) / float ys.Count
printfn "%0.2f%% false positives when n=%d" (100.0 * pFail) n;;
0.10% false positives
7.89% false positives when n=10000
82.02% false positives when n=100000

```
Just like sets, Bloom filters have many practical applications.

#### 1 comment:

Aggelos Biboudis said...

And by coincidence, some hours ago I was working on a Reactive Extensions version.
:):)
Excerpt at https://gist.github.com/1931336