Tuesday, 14 December 2010
Parsing mathematical expressions using active patterns
Active patterns allow arbitrary user-defined functions to be used to dissect values during pattern matching. This opens up some interesting possibilities with regard to parsing because it allows active patterns to be used to destructure streams of data when known patterns are encountered.
The following code sample creates several mutually-recursive active patterns that form a recursive descent parser for mathematical expressions:
> let rec (|Term|_|) = function
| Factor(e1, t) ->
let rec aux e1 = function
| '+'::Factor(e2, t) -> aux (e1 + e2) t
| '-'::Factor(e2, t) -> aux (e1 - e2) t
| t -> Some(e1, t)
aux e1 t
| _ -> None
and (|Factor|_|) = function
| '-'::Factor(e, t) -> Some(-e, t)
| Atom(e1, '*'::Factor(e2, t)) -> Some(e1 * e2, t)
| Atom(e, t) -> Some(e, t)
| _ -> None
and (|Atom|_|) = function
| c::t when '0'<=c && c<='9' -> Some(int(string c), t)
| '('::Term(e, ')'::t) -> Some(e, t)
| _ -> None;;
val ( |Term|_| ) : char list -> (int * char list) option
val ( |Factor|_| ) : char list -> (int * char list) option
val ( |Atom|_| ) : char list -> (int * char list) option
For example, the following parses the expression 1 + 2 × (3 - 4) × -5:
> let (Term e) = List.ofSeq "1+2*(3-4)*-5";;
val e : int * char list = (11, )
The F# programming language is descended from the MetaLanguages (ML) family that were specifically bred for metaprogramming including parsing. For more information on metaprogramming with F#, read the following articles in The F#.NET Journal:
- Parsing text with Lex and Yacc (30th September 2007).
- Optimizing a simple bytecode interpreter (31st October 2007).
- Parser combinators (30th November 2007).
- Language-oriented programming: The Term-level Interpreter (31st December 2007).
- Language-oriented programming: Term Rewriting (16th August 2008).
- Run-time code generation using
System.Reflection.Emit(31st August 2008).
- Parsing and visualizing binary Geographic Information System data (30th November 2009).
What difficulties? It definitely can't parse left recursive grammars.
This is actually already parsing a left recursive grammar but it is quite subtle (and I got it wrong on the first attempt!). Specifically, a-b-c must be parsed as (a-b)-c which is left recursion due to left associativity so this special case is handled using the nested aux function that eats sequences of factor-factor from left to right.
Subscribe to Post Comments [Atom]
Links to this post:
Subscribe to Posts [Atom]