PLY's parser works on tokens. Ply.yacc is a preliminary for creating parsers. It uses a BNF grammar that describes how those tokens are assembled. Parsers can handle some ambiguity. In this chemical equation example the grammar is ambiguous after reading a chemical symbol. There could be repeat count afterward or not. There are a huge number of techniques to resolve the ambiguity such as as lookahead (see what the next tokens are) and precedence rules (given the choice between "*" and "+" use "*" first).
yacc.py is used to parse language syntax. Before showing an example, there are a few important bits of background that must be mentioned. First, syntax is usually specified in terms of a BNF grammar. For example, if you wanted to parse simple arithmetic expressions, you might first write an unambiguous grammar specification like this:
expression : expression + term | expression - term | term term : term * factor | term / factor | factor factor : NUMBER | ( expression )
In the grammar, symbols such as NUMBER, +, -, *, and / are known as terminals and correspond to raw input tokens. Identifiers such as term and factor refer to more complex rules, typically comprised of a collection of tokens. These identifiers are known as non-terminals.
If any errors are detected in your grammar specification, yacc.py will produce diagnostic messages and possibly raise an exception. Some of the errors that can be detected include:
- Duplicated function names (if more than one rule function have the same name in the grammar file).
- Shift/reduce and reduce/reduce conflicts generated by ambiguous grammars.
- Badly specified grammar rules.
- Infinite recursion (rules that can never terminate).
- Unused rules and tokens
- Undefined rules and tokens
No comments:
Post a Comment