Wednesday, 18 November 2015

4. Python YACC

PLY's parser works on tokens. Ply.yacc is a preliminary for creating parsers. It uses a BNF grammar that describes how those tokens are assembled. Parsers can handle some ambiguity. In this chemical equation example the grammar is ambiguous after reading a chemical symbol. There could be repeat count afterward or not. There are a huge number of techniques to resolve the ambiguity such as as lookahead (see what the next tokens are) and precedence rules (given the choice between "*" and "+" use "*" first).


yacc.py is used to parse language syntax. Before showing an example, there are a few important bits of background that must be mentioned. First, syntax is usually specified in terms of a BNF grammar. For example, if you wanted to parse simple arithmetic expressions, you might first write an unambiguous grammar specification like this:
 
expression : expression + term
           | expression - term
           | term

term       : term * factor
           | term / factor
           | factor

factor     : NUMBER
           | ( expression )
In the grammar, symbols such as NUMBER+-*, and / are known as terminals and correspond to raw input tokens. Identifiers such as term and factor refer to more complex rules, typically comprised of a collection of tokens. These identifiers are known as non-terminals.

If any errors are detected in your grammar specification, yacc.py will produce diagnostic messages and possibly raise an exception. Some of the errors that can be detected include:

  • Duplicated function names (if more than one rule function have the same name in the grammar file).
  • Shift/reduce and reduce/reduce conflicts generated by ambiguous grammars.
  • Badly specified grammar rules.
  • Infinite recursion (rules that can never terminate).
  • Unused rules and tokens
  • Undefined rules and tokens


No comments:

Post a Comment