Sunday, 23 January 2011

Gradient descent

One of the most popular questions we receive about the old book F# for Scientists is how the gradient descent example can be rewritten to work with the current version of F# in Visual Studio 2010 rather than the early prototype covered in the book, running under Visual Studio 2005.

We begin by referencing the F# PowerPack and its Compatibility extension:

> #r "FSharp.Powerpack.dll";; --> Referenced 'C:\Program Files\FSharpPowerPack-2.0.0.0\bin\FSharp.Powerpack.dll' > #r "FSharp.Powerpack.Compatibility.dll";; --> Referenced 'C:\Program Files\FSharpPowerPack-2.0.0.0\bin\FSharp.Powerpack.Compatibility.dll' > #nowarn "62";;

Next, we define a small number that we be used to calculate numerical approximations to derivatives:

> let δ = epsilon_float ** (1.0 / 3.0);; val δ : float = 6.055454452e-06

The following function repeatedly applies the given function to the given initial value until the result stops changing:

> let rec fixedPoint f x = let f_x = f x if f_x = x then x else fixedPoint f f_x;; val fixedPoint : ('a -> 'a) -> 'a -> 'a when 'a : equality

The numerical approximation to the grad of a scalar field is built up from partial derivatives in each direction:

> let partialD f_xs f (xs : vector) i xi = xs.[i] <- xi + δ try (f xs - f_xs) / δ finally xs.[i] <- xi;; val partialD : float -> (vector -> float) -> vector -> int -> float -> float

The following function performs a single iteration of gradient descent by scaling the step size λ by either α or β if the result increases or decreases the function being minimized, respectively:

> let descend α β f (f': _ -> vector) (λ, xs, f_xs) = let xs_2 = xs - λ * f' xs let f_xs_2 = f xs_2 if f_xs_2 >= f_xs then α * λ, xs, f_xs else β * λ, xs_2, f_xs_2;; val descend : float -> float -> (Vector<float> -> 'a) -> (Vector<float> -> vector) -> float * Vector<float> * 'a -> float * Vector<float> * 'a when 'a : comparison

Finally, the following function uses the gradient descent algorithm to minimize a given function and derivative:

> let gradientDescent f f' xs = let _, xs, _ = fixedPoint (descend 0.5 1.1 f f') (δ, xs, f xs) xs;; val gradientDescent : (Vector<float> -> 'a) -> (Vector<float> -> vector) -> Vector<float> -> Vector<float> when 'a : comparison

For example, the following computes a numerical approximation to the derivative of a function:

> let grad f xs = Vector.mapi (partialD (f xs) f xs) xs;; val grad : (vector -> float) -> vector -> vector

And the following defines the famous Rosenbrock "banana" function that is notoriously difficult to minimize due to its curvature around the minimum:

> let rosenbrock (xs: vector) = let x, y = xs.[0], xs.[1] pown (1.0 - x) 2 + 100.0 * pown (y - pown x 2) 2;; val rosenbrock : vector -> float

The minimum at (1, 1) may be found quickly and easily using the functions defined above as follows:

> let xs = vector[0.0; 0.0] |> gradientDescent rosenbrock (grad rosenbrock);; val xs : Vector<float> = vector [|0.9977180571; 0.99543389|]

For the latest in-depth coverage of the F# programming language, read Visual F# 2010 for Technical Computing.

7 comments:

Art said...

YYYYAAAAAYYYYYY!
Now for "Refolding"

Anonymous said...

Thanks.
Can I ask how do you type in δ etc?

Flying Frog Consultancy Ltd. said...

@Anonymous: Good question. I actually google "unicode delta" and copy-paste from the resulting web page into the source code.

Art said...

http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Block=Mathematical_Operators:]

Art said...

Hi Jon, any chance of a future article on conjugate-gradient?
From IOD's RPP Discrete Comput Geom (2009) 41: 444–460
Sorry for my poor html math notation.

"JJTl = JD +αe, (7)
KKTl = KD +αf (9)
(7) and (9) can be solved efficiently using the conjugate-gradient method"

Art said...

Revisiting this, I get it to work in Interactive mode FSI, ... but not in VS'10.
xs : vector err 'vector' not defined, and still 'epsilon_float' not defined ... ?

Art said...

Linear Regression Gradient Descent
fssnip Grad Is there an animated version somewhere?