Friday, 9 July 2010

F# vs Mathematica: fast pricer for American options

UPDATE: A new F# solution that is 960× faster than the original Mathematica is available here.

Another example from Sal Mangano's Mathematica Cookbook, this time taken from Andreas Lauschke's example, is a "fast" pricer for American options. The relevant section of the book stresses the importance of performance in this context and the Mathematica code presented was heavily optimized by experts. Here is their code:

Specifically, this function has been optimized by pushing computational burden from the general-purpose term rewriting language of Mathematica onto the optimized C routines in its standard library and then compiling the high-level code into lower-level bytecode that is interpreted more efficiently, giving another 5× speedup.

However, this general approach to the optimization of Mathematica code has the unfortunate side effect of "foresting": introducing unnecessary intermediate data structures because built-in operations over them are faster than writing a direct solution in such a slow language. Consequently, a direct solution in a compiled language will usually be orders of magnitude faster. In this case, we found that the following translation to F# is 64× faster than Sal's original Mathematica benchmark:

let americanPut kk r sigma tt = let sqr x = x * x let a, nn, mm, tt0 = 5.0, 100.0, 20, sqr sigma * tt / 2.0 let k, h, s = 2.0 * r / sqr sigma, 2.0 * a / nn, tt0 / float mm let alpha, xn = s / sqr h, int(2.0 * a/h + 0.5) + 1 let x i = float i * h - a let ss = Array.init xn (fun i -> kk * exp(x i)) let k0, k1 = 0.5 * (k - 1.0), 0.25 * sqr (k + 1.0) let k1s = k1 * s let pp0 i = max 0.0 (kk - ss.[i]) let rec run (u: _ []) (u': _ []) j = if j=mm then u else let f i = let xi = -a + float i * h if xi >= 0.0 then 0.0 else exp(k0 * xi + k1s * float j) * (1.0 - exp xi) u'.[0] <- alpha * u.[1] - (2.0 * alpha - 1.0) * u.[0] + alpha / kk * exp(k0 * a + k1s * float j) |> max (f 0) for i=1 to u.Length-2 do u'.[i] <- alpha * (u.[i+1] + u.[i-1]) - (2.0 * alpha - 1.0) * u.[i] |> max (f i) run u' u (j+1) let u i = exp(k0 * x i) * pp0 i / kk let u = run (Array.init xn u) (Array.create xn 0.0) 0 ss, Seq.mapi (fun i u -> kk * exp(-k1 * tt0 - k0 * x i) * u) u Array.Parallel.init 10000 (fun strike -> let ss, ps = americanPut (float(strike+1)) 0.05 0.4 1.0 Seq.zip (Seq.take 60 ss) (Seq.take 60 ps))

The superior performance of the F# solution comes from two main benefits:

  • The F# program is compiled all the way down to machine code before being executed whereas the Mathematica code is interpreted. The intermediate data structures are then replaced by simple function calls. This alone makes the F# solution over 10× faster than the original Mathematica.

  • F# inherits the benefits of a highly-optimized multicore-capable run-time from the .NET platform. This allows us to use fine-grained parallelism to improve performance even further for another 6.1× speedup on this 8-core machine.

17 comments:

Leonid said...

While I will not be able to close this huge performance gap, one can squeeze some more from Mathematica. I optimized Sal's code to get another factor of 2 speed improvement. Perhaps, this is not the end, but further improvements are significantly harder to find. If the code does not come out nicely indented, my apologies.

americanPutCompiledAlt =
Compile[{kk, r, sigma, tt},
With[{a = 5, nn = 100, mm = 20, tt0 = sigma^2 tt/2,
k = 2 r/sigma^2},
Module[{alpha, h = 2 a/nn, s = tt0/mm, x, ss, tmax, f, pp0, u, z,
mflags, fparts, fconst},
alpha = s/h^2;
x = Range[-a, a, h];
ss = kk* Exp[x];
tmax = Clip[1 - Exp[x ], {0, 1}];
pp0 = Clip[#, {0, Max[#]}] &[kk - ss];
z = u = Exp[0.5 (k - 1) x] pp0/kk;
fparts =
alpha/kk Exp[0.5 (k - 1) a + 0.25 (k + 1)^2 Range[0, mm - 1] s];
fconst = Exp[1/2 (k - 1)*x]*tmax;
Do[
z[[1]] = alpha*u[[2]] - (2 alpha - 1 ) u[[1]] + fparts[[j]];
z[[2 ;; -2]] =
alpha (Take[u, {3, nn + 1}] +
Take[u, {1, nn - 1}]) - (2 alpha - 1)*Take[u, {2, nn}];
z[[-1]] = 0;
f = fconst*Exp[1/4 (k + 1)^2*(j - 1)*s];
mflags = UnitStep[z - f];
u = mflags * z + (1 - mflags) f;, {j, mm}];
{ss, kk*Exp[-0.25 (k + 1)^2 tt0] Exp[-0.5 (k - 1) x] u}]]];

Leonid said...

Jon,

The second comment is on paralellism: I have a 6-core machine, but only f cores are used by the Mathematica 7 parallel routines. Now, the following simple modifications give about 10x speed-up on my machine:

(results = Table[americanPutCompiledAlt[i,0.05,0.4,1],{i,10000}]);//Timing

{2.203,Null}

DistributeDefinitions[americanPutCompiledAlt];
(parallelresults = ParallelTable[americanPutCompiledAlt[i,0.05,0.4,1],{i,10000}]);//Timing

{0.281,Null}

results == parallelresults

True

I don't know how to explain such a dramatic speed-up. Perhaps, the Timing measurement in not quite accurate for parallel stuff, but it certainly gives 4x improvement out of 4 cores (this I checked), which probably would mean 8 times out of 8 cores. Another factor of 2 comes from my code optimization which I posted in the previous comment. This gives a factor of 16 vs your 60, which is no longer two orders of magnitude. This is probably not a fine-grain parallelism you meant, though - it won't help much for a single function call.


Meanwhile, both Sal's code and my optimization of it are quite direct - I don't see any extraneous structures - only those needed to solve the problem. So, upon thinking it over, I find your argument exaggerated, to say the least. Not to mention, that the level of efficiency of your F# code reflects the fact that you are F# expert.

Generally, your representation of Mathematica seems quite unfair. It has a different niche than F# and is more a tool to aid research and quick explorations, designed mostly for people without extensive programming background. But, as you are well aware of course, even as a programming language, it is not a toy you are trying to make of it - you can do serious stuff with it too.

Flying Frog Consultancy Ltd. said...

@Leonid: Thanks for the optimizations. I'll try them ASAP but note that Mathematica is limited to 4 cores so it will not benefit from half of the cores in my desktop. That is still a considerable performance improvement but, of course, it will only be fair once those optimizations are ported to the F# code. I suspect the F# code will see a similar speedup...

Regarding expertise, I actually have far more expertise with Mathematica than I do with F#. I have been a Mathematica user for over 10 years and have written a substantial amount of code in it, some of which was later turned into a product. I have been bitten countless times by silly bugs in Mathematica and so have our customers, so I cannot recommend it for serious use. I ditched Mathematica in favor of OCaml 5 years ago and, although I love playing with it, I would never use Mathematica for serious work now. At the very least, check the answers it gives you very carefully!

Flying Frog Consultancy Ltd. said...

Regarding the extraneous data structures, the matrix "f" is one example here. This only exists as an explicit data structure in the Mathematica version to facilitate the compound MapThread with Max over it because that is far more efficient than element-wise operations in Mathematica. Each element of the matrix is used only once. So it is far more efficient in a compiled language to make "f" a function and call it to obtain each result, a single number, on-demand without ever creating the (~10kB) intermediate data structure on every invocation of the main function.

Conversely, this kind of restructuring is how you optimize Mathematica code. You try to replace explicit loops with vector operations because that replaces a slowly-interpreted loop with a fast compiled loop in the C code of Mathematica's internals. In terms of the actual implementation, Mathematica special cases built-in functions passed as arguments to higher-order functions like passing Max to MapThread, invoking the internal C function for Max directly from its pointer without using the term rewriter at all.

Leonid said...

Jon,

I do not completely disagree with you. The presence of bugs which one has to wait to be fixed is a problem of any large size commercial software. This is a problem, I agree. But the strength of Mathematica (as a programming language, not to mention its symbolic capabilities) is in the integration of many disjoint pieces of functionality (libraries), and the design which allows them to interoperate, and the users to use the uniform high - level scripting language. I think I actually saw this very argument in one of your posts somewhere on newsgroups a while back.

Mathematica then allows me to do things I would not be able to realistically do otherwise (especially in the exploratory phase where I have a vague idea of what I am aiming at), just because it makes combining several different areas easy. Perhaps a combination of Mathematica with some production language, with Mathematica being a dynamic scripting and prototyping language, can be in some "mission-critical" development more beneficial than the use of only one of these tools. For the near future for me it will likely be Java/Clojure (just because I have some production Java experience, and Java/Mathematica integration seems rather straightforward, and also cross-platform), and may be some C. But actually, ML/Ocaml are also on my wish-list of things to learn.

Regarding optimizations: I am also aware of what happens under the hood when we optimize code in Mathematica, may be not on the level of kernel developers but enough for my needs.
My experience is that the most efficient Mathematica code comes from a combination of Compile and the use of packed arrays wherever possible inside Compile (because packed array operations by-pass also the byte-code and are thus yet more efficient - about just as fast as C code). At times, packed arrays inside Compile can give another order of magnitude speedup with respect to Compile only. A simple example is here .

Regarding extraneous structures: take a careful look at the code I posted - I have eliminated the matrix "f" - my "f"-s are now lists created in a loop on demand. Therefore, I expect my version to also be more memory-efficient, FWIW. I can not get rid of lists in favor of numbers because indeed then I will lose the advantage that packed arrays (vectorized operations) give me. But I don't think vectors are inappropriate structures here - code written with them does not look unnatural to me (for matrix-valued f of the original code I tend to agree with you). Some (rather minor) optimizations I used are indeed problem-specific, for example factorization of exponents.

Andreas said...

Part 1

Jon,
I am the original contributor of this recipe. I'd like to make a few comments.

"a direct solution in a compiled language will usually be orders of magnitude faster."

Well, of course. Compile[] still creates bytecode and not machine code, and it was never intended to compete with machine code. It speeds things up on the order of 3 - 25 compared to unCompile[]'d M code, depending on the particular problem, and for that I can appreciate what Compile[] does. It's not bad just because it doesn't provide for machine-code speed.

"This alone makes the F# solution over 10× faster than the original Mathematica."

This has nothing to do with F#. My Java and C# versions are also much faster. Also perl, Python, Ruby, C, C++, Fortran would be much faster. I have done a lot of performance benchmarking of several programming languages over several years, and I know that you can generally expect "out of the box" Java to be 2 - 5 times faster, and C# 5 - 10 times faster than Compile[]'d M. But that is an unfair comparison. The prowess of M doesn't come from the "speed" of Compile[]'d code. It comes from its internal algorithms, and these are the best in the world. It's quite unfair to hold against a state-of-the-art submarine that it cannot fly from New York to Frankfurt, and likewise it would be unfair to hold against a state-of-the-art plane that it cannot dive from California to Japan. You can't hold against something that it cannot do what it was never intended to do. It would be just as unfair if I held against F# that it doesn't have all the world's best math algorithms built-in, doesn't have symbolic math capabilities, doesn't have intuitive graphics, doesn't have on-the-fly interactivity, etc. It was never intended for that.

I absolutely LOVE F#, basically since MSFT was ready to release it. But Sal's book is about best practices (at least, good solutions) implemented in M! What's wrong with showing superior M solutions in a M book? It would be just as unfair if I took an F# book and criticized that F# cannot do what M can.

Andreas said...

Part 2

While I agree with you regarding your first bullet point (but only taken as a statement, not an "explanation", because you make unfair comparisons), I don't with the second. The algorithm is inherently single-threaded, as is usual for grid methods to compute options. To take 10,000 runs of the SAME algorithm in parallel doesn't prove any point. You can take ANY algorithm and run it several times in parallel. And, as Leonid has shown, you can run the function in parallel in M as well. My Java implementation actually uses the concurrency framework we have since J5 to use worker threads with incredible speed results.

(And, a 10,000 strike for a stock option is kinda high ... the only stock I know in that area was Berkshire Hathaway several years ago. You leave the base interval a and the sizes N and M unchanged ... that doesn't produce good numbers anyway.)

If you REALLY want the fastest execution, may I suggest Fortran or Assembler? But again, the point of this recipe was to show a very fast solution to a M (!!!) user, as this is a M book! (not to provide the highest speed attainable in other languages/platforms)

By the way, I am about to release a M package that allows the user to define in a string the Java code of a static method, and then call that method under its method name (as a symbol in M). It's the ACTUAL Java code running in the JVM, which is particularly useful for extensive loops. That is another way to get compiled Java speed directly from the convenience of M, which beats Compile[] easily.
(www.lauschkeconsulting.com/jcc.html -- still in beta stage)

Leonid,
your improvements are great, and when Sal's website www.mathematicacookbook.com is ready, that can be contributed as an improvement to my recipe. I like your idea of Clip and UnitStep, and the splitting of f. However, Clip and UnitStep don't compile, as you can see when inspecting americanPutCompiledAlt[[4]]. But, that doesn't really matter. There are a few functions in M where compilation may actually slow things down due to the very efficient internal structures (NestWhile and NestWhileList are two examples), and if M has to go through the evaluator for only a FEW exceptions, that is still ok (just don't get a long list of uncompiled symbols).

Also, regarding your comment about packed arrays, I don't quite understand from what you write if you understand that lists returned by Compile[] are ALWAYS packed. You can rely on that.

Leonid said...

@Andreas,

Andreas, thanks for your appreciation. It took me some time to figure out these optimizations - your code did not leave much space for improvements.

I did examine the byte code instructions, of course (I always do) and realize that Clip did not compile. It does not matter for Clip, and if you examine the original code you will also observe that certain functions such as Take and some internal iterators are left in byte code uncompiled. My experience is that faster byte code typically has fewer instructions. Your original code has 206 instructions, while my version has 164 - this is one reason why my version is faster, looking from the byte code perspective.

Regarding packed arrays: I sure do realize that they are always packed when used in Compile. I meant something different, and if you look at my comments again (and the link to a simple example I gave), this should become more clear. What I meant is this: when you index your array elements inside Compile, this results in byte code instructions. While no unpacking occurs, you still have an overhead of the byte code Mathematica interpreter indexing into individual elements of your packed array. It is much less than the overhead of the full high-level evaluation loop of course, and this difference results in a speed-up that you usually see upon Compiling a straight-forward procedural code.

However, if you manage to use the vectorized operations such as vector assignments or invocation of built-in Listable functions, you by-pass the bytecode altogether, and then for these operations there is practically no overhead w.r.t. the C version. This may give up to an order of magnitude speed-up with respect to naive use of Compile, as I demonstrated on a simple example I referred to in my previous comment. Operations such as Clip, UnitStep and Unitize are typical for this technique, because, in particular, they can be used to replace If statements and realize control flow in this approach. For this to apply, you must have some parallelism inherent in your problem.

My experience is that you typically gain an order of magnitude speed-up when Compiling high-level Mathematica code, and another order of magnitude speed-up when most of your operations inside Compile are vectorized. This confirms the common wisdom that each intermediate language layer costs order of magnitude in performance. Your code was very well-written, I usually am able to squeeze more out of the code I see around. However, your use of MapThread in a few places was sub-optimal in exactly the way I meant when I mentioned packed arrays - instead of fully utilizing packed arrays, you in those particular places replaced massive simultaneous operations on them by byte-code array indexing (which is a result of compiling MapThread invocations). Since part of your code did utilize the packed arrays fully, the total speed-up in this case was ony about twofold. It is however substantial, given that the vectorized assignments in the loop are much more costly in principle - this is where most of the work is done, and yet they contributed only about 30% to the running time, because they are fully vectorized. At the same time, the (in principle much less intensive) search for a maxima, expressed in byte code, consumed about 50% of the running time, and this is what I was able to eliminate.

Andreas said...

Leonid,
thanks for digging into this. As I said, your improvements are valuable contributions. When Sal's website is up, we can post your improved version. I suggest we do, so others can benefit from this as well.

However, I don't quite understand what you mean with "certain functions such as Take and some internal iterators are left in byte code uncompiled". I can't see anything remaining unCompile[]'d in either original version:

DeleteCases[Flatten@americanPutCompiled[[4]],_?NumericQ] is {}.

DeleteCases[Flatten@americanPutCompiledAlt[[4]],_?NumericQ] is {Clip,Clip,UnitStep}.

I must be misunderstanding you?

Leonid said...

@Jon

Jon, I must have been doing the benchmarks wrong. My code is actually 4-5 times faster than the original M code - I just redid benchmarks on a fresh kernel M7.0:

In[12]:= (res1 =
Table[americanPutCompiled[i, 0.05, 0.4, 3], {i, 1000}]); // Timing

Out[12]= {1.454, Null}

In[13]:= (res2 =
Table[americanPutCompiledAlt[i, 0.05, 0.4, 3], {i,
1000}]); // Timing

Out[13]= {0.312, Null}

In[14]:= res1 == res2

Out[14]= True

I hope that this time I am right and last benchmark was wrong, not vice versa. If so, this further closes the gap - now you get only a factor of 3 speed-up if I use 4 cores, and a factor of 1.5 if M is used with 8 cores (which is presumably possible for a grid license). The latter is a more fair comparison anyway, since you used 8-core machine.

@Andreas

Andreas, you are right - no uncompiled commands in the byte code, M7.0. My apologies. Also, the breakdown on the number of byte code instructions I gave was wrong: it is 337 for your code and 235 for mine (measures as Length[americanPutCompiled[[4]]] and Length[americanPutCompiledAlt[[4]]] respectively), please disregard older numbers.

Regarding posting the improved version on Sal's web site - sure, that's a great idea!

Flying Frog Consultancy Ltd. said...

@Leonid: I am on the road at the moment but I just had a quick play with my F# code on a netbook and made it several times faster here by reducing the number of calls to "exp". I suspect there is even more room for improvement...

"now you get only a factor of 3 speed-up if I use 4 cores, and a factor of 1.5 if M is used with 8 cores". Where did this factor of 1.5 come from?!


@Andreas:

"The prowess of M doesn't come from the "speed" of Compile[]'d code. It comes from its internal algorithms, and these are the best in the world". Absolute nonsense. For example, Mathematica's Fourier was 4x slower than FFTW last I checked.

"This has nothing to do with F#. My Java and C# versions are also much faster". But, of those languages, only F# is both short and fast.

"It's quite unfair". Nonsense.

"To take 10,000 runs of the SAME algorithm in parallel doesn't prove any point". Not true. Functions running in parallel contend for shared resources like main memory bandwidth. The vectorized approach to programming you guys are using to work around Mathematica's performance deficiencies wastes enormous amounts of memory bandwidth and, consequently, is likely to destroy scalability on a multicore as a consequence. Moreover, Leonid only got 3x speedup on 4 cores...

"That is another way to get compiled Java speed directly from the convenience of M, which beats Compile[] easily". Except embedding code in strings is anything but convenient and Java's performance characteristics suck.

Andreas said...

"Absolute nonsense. For example, Mathematica's Fourier was 4x slower than FFTW last I checked."

Absolute nonsense. You are again focussing on the speed issue for numeric computations.

"But, of those languages, only F# is both short and fast."

Agreed, but my sentence was referring to the nonsense you were trying to "explain" in YOUR sentence "This alone makes the F# solution over 10× faster than the original Mathematica." Now you are changing the item to something else. "Short" is not something you said in your sentence you were missing, and "fast" is really almost entirely a platform feature, not really a language feature. I have an issue with the statement "F# is fast" or "C# is fast". Most of the speed hinges entirely on the .Net platform -- language "contributions" to speed are secondary.

"Nonsense."

Nonsense. And not exactly productive to leave it with that.

"Not true. Functions running in parallel contend for shared resources like main memory bandwidth. The vectorized approach to programming you guys are using to work around Mathematica's performance deficiencies wastes enormous amounts of memory bandwidth and, consequently, is likely to destroy scalability on a multicore as a consequence. Moreover, Leonid only got 3x speedup on 4 cores..."

Agreed, but not the point.

"Except embedding code in strings is anything but convenient and Java's performance characteristics suck."

a) You are misunderstanding the meaning of "convenient" as I have used it. It's more convenient than having to go to a Java IDE and write Java code, compile it, and then read the .class or .jar file in with JLink. I say this from the perspective of a M user who wants to outsource certain loop structures to Java.
b) "Java's performance characteristics suck" ... totally uneducated opinion. I recommend you learn a bit more about Java and benchmarking. With proper warm-up and OSR technology (on-stack replacement) we see the first Java implementations actually beating C++ in terms of speed, and we can expect much more, especially in terms of concurrency speed-ups when Java 7 comes out. Just google the web a bit for C++ vs. Java comparisons where Java is using warm-up on a 64-bit system. "Out of the box" C# is always faster than "out of the box" Java (I had mentioned that), but you can configure the Hotspot VM and do warm-up. This also results in dynamic compilation, unlike C#, which ends up in static compilation (never changes again during runtime) whereas the Hotspot VM (re)compiles occasionally. My Java implementations with warm-up regularly beat the (corresponding and almost identical) C# implementations.

OK, this is my last posting on this thread. You seem to have an axe to grind against M and towards that end don't even mind writing completely contradictory sentences and those that don't address the point being discussed. I refuse to argue with people that refuse the principles of fairness in communication. As I said, I abosluteluy EMBRACE F#, but it still remains the case that your "reasoning" is completely false.

Leonid said...

@Jon,

Jon,about the speed difference factor: my code runs about 5 times faster than the original on M7.0, and for 4 cores I actually seem to get 4x the speed of a single core. The Timing measurements may be corrupt but I just measured the timing manually on larger samples. I can fully expect that on 8 cores I will then get 8x speedup, since parallelizing the Table should be straight-forward and scale linearly. Then I get 5x times 8x which gives 40 times speedup vs. the original version run on s single core. So you get 64/40, which is around 1.5. But ok, let it be 2x, not 1.5x. In any case, this is no longer even a single order of magnitude. OTOH, the fact that M7 comes with only 4 kernels is a question of license, not technology.

Regarding optimization coming from factorization of exponents, for M code this is not the major source of the speedup. The major source of the speed-up was the elimination of MatThread in favor of Clip and UnitStep. Therefore, I am a little surprised that similar optimizations in F# code alone can give such a substantial speed-up as you mentioned . Would be interesting to know your final results once you check them.

Flying Frog Consultancy Ltd. said...

@Andreas:

"You are again focussing on the speed issue for numeric computations". Mathematica's Fourier also produces larger numerical errors. So your claim that Mathematica's internal algorithms are "the best in the world" is clearly nonsense. Some of them are good, others are not.

"It's more convenient than having to go to a Java IDE and write Java code, compile it, and then read the .class or .jar file in with JLink". JLink? Why bother interoperating with Mathematica here when it has no relevant benefits?

"totally uneducated opinion". 17× slower than F# because the JVM's implementation of generics sucks and it cannot even express value types. These are serious deficiencies in the foundation Java is built upon.

"My Java implementations with warm-up regularly beat the (corresponding and almost identical) C# implementations". Unverifiable anecdotal evidence. I published my code. Where is yours?

"You seem to have an axe to grind against M". On the contrary, I highly recommend Mathematica for applications where it is good (e.g. teaching children). I just refuse to recommend it for applications like this one where Mathematica sucks beyond belief.


@Leonid:

Printing the current datetime with Date[] makes it clear that Timing is giving the wrong results in the context of ParallelTable so you must switch to AbsoluteTime to measure the actual time elapsed.

Doing so on this 8-core 2.0GHz E5405 Xeon desktop, Mathematica spews errors which I believe are because half of the 8 kernels it spawns automatically die because I am expected to pay Wolfram Research more money to use them and your new version is only 2.9× faster than the original when both are parallelized.

Moreover, my latest F# code is still 15× faster than your Mathematica and now 160× faster than the original Mathematica code given in Sal Mangano's Mathematica Cookbook...

Flying Frog Consultancy Ltd. said...

The timings are also very sensitive to the parameters hardcoded into the program. If I increase both nn and mm by a factor of 10 and compute strikes from 1 to 1000 then my F# runs 200× faster than your Mathematica (36.5s vs 0.186s) and a whopping 960× faster than the code given in Sal Mangano's Mathematica Cookbook!

Leonid said...

@Jon,

This is quite impressive. I will certainly look into F# at some point. But speed is not everything. There are many reasons to use Mathematica other than speed, and if speed of some portion of code is critical, there are many tools (C, Java, etc) which can be used, F# being just one of them.

F# seems a great tool, but, as Andreas and myself mentioned earlier, comparing it to Mathematica is not very meaningful. For example, in Mathematica the above function can be immediately plugged into Manipulate and used interactively. Anyways, you wrote a whole article about M-F# interoperability where you state in the abstract that their combination can be very powerful. This argument I like much better than these posts confronting these tools. You don't have to kick M to prove that F# is good.

Flying Frog Consultancy Ltd. said...

@Leonid:

You only have to look at our previous post to see an example where Mathematica came out on top. We are not the ones being unfair here...

Interesting that you bring up Manipulate. The equivalent functionality is easily obtained in F# simply by adding a slider at the top of the window to control a parameter, of course, and the GUI capabilities of .NET are way beyond what can be done in Mathematica but performance is even more important in that case because you want a responsive user interface. A large chunk of Sal's book is (quite rightly) devoted to addressing Mathematica's performance woes specifically in the context of Manipulate. F# has much less of a problem because it is so much faster.