Tuesday, 14 December 2010

Parsing mathematical expressions using active patterns

Active patterns allow arbitrary user-defined functions to be used to dissect values during pattern matching. This opens up some interesting possibilities with regard to parsing because it allows active patterns to be used to destructure streams of data when known patterns are encountered.

The following code sample creates several mutually-recursive active patterns that form a recursive descent parser for mathematical expressions:

> let rec (|Term|_|) = function
| Factor(e1, t) ->
let rec aux e1 = function
| '+'::Factor(e2, t) -> aux (e1 + e2) t
| '-'::Factor(e2, t) -> aux (e1 - e2) t
| t -> Some(e1, t)
aux e1 t
| _ -> None
and (|Factor|_|) = function
| '-'::Factor(e, t) -> Some(-e, t)
| Atom(e1, '*'::Factor(e2, t)) -> Some(e1 * e2, t)
| Atom(e, t) -> Some(e, t)
| _ -> None
and (|Atom|_|) = function
| c::t when '0'<=c && c<='9' -> Some(int(string c), t)
| '('::Term(e, ')'::t) -> Some(e, t)
| _ -> None;;
val ( |Term|_| ) : char list -> (int * char list) option
val ( |Factor|_| ) : char list -> (int * char list) option
val ( |Atom|_| ) : char list -> (int * char list) option

For example, the following parses the expression 1 + 2 × (3 - 4) × -5:

> let (Term e) = List.ofSeq "1+2*(3-4)*-5";;
val e : int * char list = (11, [])

The F# programming language is descended from the MetaLanguages (ML) family that were specifically bred for metaprogramming including parsing. For more information on metaprogramming with F#, read the following articles in The F#.NET Journal:


LOST said...

And what computation complexity will you get using this technique?
What difficulties? It definitely can't parse left recursive grammars.

Jon Harrop said...

@lost: I'm not sure what the computational complexity of this approach is but the only difference with ordinary recursive descent parsing is that F# probably will not factor out common active subpatterns from one match case to the next.

This is actually already parsing a left recursive grammar but it is quite subtle (and I got it wrong on the first attempt!). Specifically, a-b-c must be parsed as (a-b)-c which is left recursion due to left associativity so this special case is handled using the nested aux function that eats sequences of factor-factor from left to right.