The previous post described a performance-critical pricer for American options. Upon reflection, the optimized Mathematica code had introduced many redundant computations in order to vectorize the code because vectorization offers considerable performance improvements in such a slow language. Our simple translation of the optimized Mathematica had not taken advantage of this by removing these redundant computations.
This new optimized and parallelized F# solution is now a whopping 960× faster than the original Mathematica code from Sal Mangano's Mathematica Cookbook:
let americanPut kk r sigma tt = let sqr x = x * x let ikk, a, nn, mm, tt0 = 1.0 / kk, 5.0, 1000, 200, sqr sigma * tt / 2.0 let k, h, s = 2.0 * r / sqr sigma, 2.0 * a / float nn, tt0 / float mm let alpha, k0, k1 = s / sqr h, 0.5 * (k - 1.0), 0.25 * sqr (k + 1.0) let x i = float i * h - a let ex = Array.init (nn+1) (fun i -> exp(x i)) let ek0x = Array.init (nn+1) (fun i -> exp(k0 * x i)) let pp0 i = max 0.0 (kk - kk * ex.[i]) let ek0a, k1s, a' = exp(k0 * a), k1 * s, 2.0 * alpha - 1.0 let mutable u = Array.init (nn+1) (fun i -> ek0x.[i] * pp0 i * ikk) let mutable u' = Array.create (nn+1) 0.0 for j=0 to mm-1 do let ek1sj = exp(k1s * float j) let k2 = alpha * ikk * ek0a * ek1sj u'. <- alpha * u. - a' * u. + k2 |> max (ek0x. * ek1sj * (1.0 - ex.)) let inline get (u: _ ) i m = max m (alpha * (u.[i+1] + u.[i-1]) - a' * u.[i]) for i=1 to nn/2 do u'.[i] <- get u i (ek0x.[i] * ek1sj * (1.0 - ex.[i])) for i=nn/2 to nn-1 do u'.[i] <- get u i 0.0 let u'' = u u <- u' u' <- u'' let k3 = kk * exp(-k1 * tt0) Seq.mapi (fun i u -> kk * ex.[i], k3 / ek0x.[i] * u) u Array.Parallel.init 1000 (fun i -> americanPut (float(i+1)) 0.05 0.14 1.0)
Note that the hardcoded constants nn and mm have been increased, sigma has been decreased to keep alpha just below 0.5 and strikes from 1 to 1,000 are computed in order to make the F# code take a significant fraction of the second (when the Mathematica takes several minutes!).