Wednesday, 14 February 2007

Avoiding copying in functional programming languages

This boils down to an absolutely pivotal concept in functional programming
called referential transparency. This concept is one of the main stumbling
blocks for imperative programmers learning FPLs like F#. I
certainly had trouble grokking this concept when I ditched C++.

In essence, when a new immutable data structure is created from an existing
data structure, the new data structure has a lot in common with the original.
An imperative programmer immediately assumes (incorrectly) that the original
data structure must have been copied. In fact, there is never any need to
copy immutable data because you can simply refer back to the original, i.e.
referencing is transparent.

So FPLs effectively handle everything by pointer (mutable or immutable) and
the only copying you'll ever incur is copying pointers. In this case, the
pointers inside the record will be copied, which is nothing to worry about.

In the vast majority of cases, succinct F# code is efficient F# code. As
long as you don't explicitly copy lots of data, there won't be much copying
going on "under the hood".

4 comments:

brett said...

You do need to be careful when you start working with large data structures though. For example: say you have a really long list and you append an element to it. This is going to be an O(N) operation where every pointer in the list gets copied. True, you aren't copying every object in the list but the pointer copies, while cheaper than copying the underlying objects, aren't free. Prepending an element to the front of the list is O(1) though since you all you're doing is creating a pointer to the first node in the old list. So if you can you're better off prepending to lists as opposed to appending. Things get even more complicated with more complicated data structures like trees. So even though referential transparency buys you a lot you still can't go around completely ignoring what is happening to your data (not that I'm saying that you are advocating this, just thought I'd drop a cautionary note). Chris Okasaki's book has a lot of good information on dealing with data in functional languages.

Jon Harrop said...

You're absolutely right, Brett. There's no substitute for finding out the asymptotic algorithmic complexities of any operations that you're considering using. I've actually been meaning to ask the F# development team if they could include that information in the documentation of as many stdlib functions as possible.

Needless to say, my forthcoming book F# for Scientists gives a detailed explanation of the importance of asymptotic complexity as well as tabulating the complexities of all important operations for most of the data structures provided with F#.

With the recent change in the internal representation of F#'s lists to make them compatible with IEnumerable, I've been wondering what it would be like if a different representation of lists was used. Maybe skip lists?

Cheers,
Jon.

Ralf Herbrich said...

I have been using F# quite a lot in my work and I always used sequences (seq<'a>) instead of lists for two reasons:

1. They are much more memory efficient; they effectively only hold one instantiations in memory.
2. They are lazy constructs (which allow me to "stage-up" computations).

The only risk I found is that sequences are not persistent; that is, if the generation of a sequence involves some "random number" generation than it is better to first convert them into an array (Seq.to_array) and then treat the array as a sequence (as every array implements an IEnumerable<'a> = seq<'a>).

chribben said...

If no underlying objects are copied when appending an element to a list but only pointers, how can List.tail "know" which list (original or "appended" list) is referenced in the case the current pointer is pointing to the object before the appended object?
Cheers,
Christian