Sunday, 27 December 2009

Evolution: the weasel program

An interesting challenge is called Evolutionary Algorithm and draws upon a famous computer simulation written by Richard Dawkins to demonstrate the power of random variation and non-random cumulative selection in natural and artificial evolutionary systems, and how this process differs from chance.

The program begins with a random string of letters and spawns a generation of 200 random mutations of the parent string, selects the fittest mutation (by similarity to an "ideal" string) and uses that individual to spawn the next generation. The ideal string used is "METHINKS IT IS LIKE A WEASEL".

The results of this program clearly demonstrates how selection can drive random processes to solve problems extremely efficiently: only 6 generations were required to reproduce the ideal string exactly. In fact, randomized algorithms are of great practical importance and a similar technique, simulated annealing, is used as an example in our F# for Technical Computing book.

Our F# solution is as follows:

let target = "METHINKS IT IS LIKE A WEASEL"
let charset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "
 
let rand = System.Random()
 
let fitness (trial: string) =
  Seq.zip target trial
  |> Seq.fold (fun d (c1, c2) -> if c1=c2 then d+1 else d) 0
 
let mutate parent rate _ =
  String.mapi (fun _ c ->
    if rand.NextDouble() < rate then c else
      charset.[rand.Next charset.Length]) parent
 
do
  let mutable parent =
    String.init target.Length (fun _ ->
      charset.[rand.Next charset.Length] |> string)
  let mutable i = 0
  while parent <> target do
    let pfit = fitness parent
    let best, f =
      Seq.init 200 (mutate parent (float pfit / float target.Length))
      |> Seq.map (fun s -> (s, fitness s))
      |> Seq.append [parent, pfit]
      |> Seq.maxBy (fun (_, f) -> f)
    if i % 100 = 0 then
      printf "%5d - '%s'  (fitness:%2d)\n" i parent f
    parent <- best
    i <- i + 1
  printf "%5d - '%s'\n" i parent

This program produces the following output:

    0 - 'CEUMIDXSIXOOTSEHHXVMD IHTFWP'  (fitness: 6)
  100 - 'PEPHIZLB NGSIO LCWE AQEKCSZQ'  (fitness:11)
  200 - 'MESHIZHB IQ IO LTWGGAQWMKSRX'  (fitness:13)
  300 - 'MESHIZHB IQ IO LTWGGAQWMKSRX'  (fitness:13)
  400 - 'METHIVKS ITLIN LYKJPABWDASEU'  (fitness:19)
  500 - 'METHINKS IT IB LIKEFA WDASEL'  (fitness:25)
  518 - 'METHINKS IT IS LIKE A WEASEL'

1 comment:

Anonymous said...

Very interesting.

One small comment. I think you meant to say "in less that 600 generations", not "just 6". The program only prints one line every 100 generations.

I wonder how much the length of the target affects the number of generations required to reach it by random mutation. Mmmh. There's one way to find out...

[Fires up VS 2010]