Tuesday, 6 March 2007

Fast Fourier Transforms

I've spent the past few weeks trying to pin down the best possible way to do some core computations used in scientific computing from .NET and the F# programming language.

Several of the examples from my book OCaml for Scientists use these techniques via libraries like FFTW and LAPACK. I'll translate some of them into F# for my forthcoming book F# for Scientists. I cannot over-emphasise the importance of these algorithms. They underpin so many different kinds of analysis used by scientists and picking the wrong implementation can cost you months of wasted work.

First up is the Fast Fourier Transform. This algorithm underpins many spectral-based methods (including time-frequency methods) and it is very difficult to write a good (fast and accurate) Fast Fourier Transform routine. In particular, the infamous Numerical Recipies presents a lot of misinformation on the subject and should be avoided.

I have tried dozens of FFT implementations in the past and found two good ones. Under Linux there is the excellent FFTW library by Matteo Frigo and Steven G. Johnson. This library actually handles the FFTs for Matlab. Also, Mathematica has an excellent FFT algorithm built-in that works not only with ordinary floats but also with interval arithmetic and even arbitrary-precision arithmetic!

So, what is available under .NET? There are some expensive commercial offerings as well as some free ones. I'll start by describing the common problems before getting onto the implementations themselves.

Generality: an FFT routine must be able to handle all lengths of input "n" from zero upwards. Surprisingly often, FFT routines are distributed that only handle integral powers of 2. More worryingly, scientists and engineers are often told that padding with zeros up to the next integral power of 2 is "ok" (it is not!) and that power of 2 sizes are much faster to transform (they are not!). Handling arbitrary sizes is extremely important because the errors introduced by zero padding can be severe and a significant amount of work is required to prove that the artificial errors do not affect the results.

Performance: an FFT routine suitable for general use must be able to maintain O(n log n) complexity for all input sizes "n". Prime factor decomposition can get you a long way there but you don't want O(n^2) when you try to transform a prime-length array, so that technique must be combined with something like Bluestein's convolution method to recover small prime factors albeit with a bigger constant prefactor.

As it turns out, the best free implementation that I've found for F# users so far is still the fantastic FFTW library. Their site has a precompiled Windows DLL. I've written minimal bindings that allow thread-safe access to FFTW from F#, with both guru and simple interfaces. Performance is excellent, 32-bit Windows XP Pro is only up to 35% slower than 64-bit Linux.

I'll publish the 90-line bindings ASAP and F# for Scientists will detail how the bindings are written and how they can be used as well as some complete example programs using this library to perform FFTs.

Faced with a lack of O(n log n) implementations suitable for inclusion in commercial products, we wrote our own implementations in C# and F#. The C# implementation is now available on our site:

http://www.ffconsultancy.com/products/signal_processing_net/

For end users wanting to compute FFTs cheaply, my recommendation is definitely for the free FFTW library. If you're writing a commercial product then you can buy a licence for it from MIT or, if a million dollars is out of your budget, you can buy our FFT implementation for only £99.

I'll disseminate my results on matrix computation next time...

Cheers,
Jon.

2 comments:

sigfpe said...

If you want to be realy helpful, you really ought to point out exactly where the misinformation in NR lies.

Jon Harrop said...

My main objection to NR is their complete lack of discussion about all "n" transforms, which are vitally important.

Luckily many other people have already criticised NR publically. For example, the Wikipedia page for NR currently quotes the authors of FFTW stating that the routine in NR is 3-10x slower than FFTW.

Such discrepancies can be attributed to NR being very out of date and performance changing considerably due to changes in CPU architecture and the widening memory gap.