Python PLY Programming: 4. Python YACC

PLY's parser works on tokens. Ply.yacc is a preliminary for creating parsers. It uses a BNF grammar that describes how those tokens are assembled. Parsers can handle some ambiguity. In this chemical equation example the grammar is ambiguous after reading a chemical symbol. There could be repeat count afterward or not. There are a huge number of techniques to resolve the ambiguity such as as lookahead (see what the next tokens are) and precedence rules (given the choice between "*" and "+" use "*" first).

yacc.py is used to parse language syntax. Before showing an example, there are a few important bits of background that must be mentioned. First, syntax is usually specified in terms of a BNF grammar. For example, if you wanted to parse simple arithmetic expressions, you might first write an unambiguous grammar specification like this:

 
expression : expression + term
           | expression - term
           | term

term       : term * factor
           | term / factor
           | factor

factor     : NUMBER
           | ( expression )

In the grammar, symbols such as NUMBER, +, -, *, and / are known as terminals and correspond to raw input tokens. Identifiers such as term and factor refer to more complex rules, typically comprised of a collection of tokens. These identifiers are known as non-terminals.

If any errors are detected in your grammar specification, yacc.py will produce diagnostic messages and possibly raise an exception. Some of the errors that can be detected include:

Duplicated function names (if more than one rule function have the same name in the grammar file).
Shift/reduce and reduce/reduce conflicts generated by ambiguous grammars.
Badly specified grammar rules.
Infinite recursion (rules that can never terminate).
Unused rules and tokens
Undefined rules and tokens

Python PLY Programming

Wednesday, 18 November 2015

4. Python YACC

No comments:

Post a Comment