Handout # 3
LR Parsing

Definitions

(Examples come from the Homework # 4 grammar. We define terms for LR(0).)

rule
A production from the grammar. Example:

\begin{program}
\w{stmt} $\rightarrow$\ if \w{expr} then \w{stmt}
\end{program}
item
A rule with a `dot' somewhere in it. Example:

\begin{program}
\w{stmt} $\rightarrow$\ if . \w{expr} then \w{stmt}
\end{program}
complete item
An item with a dot at the end of the rule. Example:

\begin{program}
\w{stmt} $\rightarrow$\ if \w{expr} then \w{stmt} .
\end{program}
augmentation
The process of adding an item to a set of items in which a dot occurs before an occurrence of a nonterminal. Items are added for all rules for that nonterminal with the dot at the start. Example: the set

\begin{program}
\w{stmt} $\rightarrow$\ if . \w{expr} then \w{stmt}
\w{stmt} $\rightarrow$\ if . \w{expr} then \w{stmt} else \w{stmt}
\end{program}
includes dots before expr and is augmented by adding the following items:

\begin{program}
\w{expr} $\rightarrow$\ . \w{expr} + \w{expr}
\w{expr} $\rightar...
...rrow$\ . \textrm{ID}
\w{expr} $\rightarrow$\ . \textrm{INT\_CONST}
\end{program}
state
A closed set of items, that is, a set that cannot be further augmented. Example:

\begin{program}
\w{program} $\rightarrow$\ . \w{stmt\_list}
\w{stmt\_list} $\rig...
...
\w{stmt} $\rightarrow$\ . if \w{expr} then \w{stmt} else \w{stmt}
\end{program}
partial tree
The result of reducing a portion of the yield to a single symbol. The symbol may be a terminal, in which case, the partial tree represents just the terminal's value, or it may be a nonterminal, in which case the partial tree represents one or more reductions. The parser never examines the contents of the partial tree. For PA3, the partial tree will either be ignored (such as for operator symbols) or will be an AST (abstract syntax tree). Example:

\begin{program}
\w{stmt}
\vert
\textrm{ID} = \w{expr}
\vert
\textrm{INT\_CONST}
\end{program}
parse stack
A stack of alternating states and partial trees. The deepest element will be a state.

LR Parsing

We add a special production to the grammar:
\begin{program}
\w{SS} $\rightarrow$\ \w{start\_symbol} \$\$
\end{program}
where $$ represents the end-of-file condition.

The following algorithm assumes we are constructing states on the fly. If the states are created ahead of time, some of the steps can be simplified as described in Section 3.

  1. Construct the state that includes the item

    \begin{program}
\w{SS} $\rightarrow$\ . \w{start\_symbol} \$\$
\end{program}
    by augmenting the set until closed. Push this state on the stack. Continue with step 2.
  2. Non-deterministically choose an action:
  3. Look at all items in the state immediately beneath the top partial tree. Construct the set of all items that can be constructed by moving the dot over exactly the symbol of the partial tree. If the set is empty, stop the parse with REJECT. Close the set and push the closed set (a state) on the stack. Go to step 2.
  4. If rule for complete item is the specially added rule, stop the parse with ACCEPT. Otherwise, let $n$ be the number of symbols on the right-hand side of the rule from the stack and pop $2n$ elements from the stack: $n$ states and $n$ partial trees. Discard the popped states, but use the popped partial trees to construct a new partial tree with symbol from the left-hand-side of the rule. Push new partial tree on stack. Go to step 3.
The non-determinism of step 2 can lead to parsing conflicts: shift/reduce conflicts when there is a dot before a terminal in the current state as well as a complete item, and reduce/reduce conflicts when there are more than one complete item. LR(0) has no way to avoid such conflicts.

The ``perfect'' LR parser nondeterministically chooses the action that enables an ACCEPT. Realistic parsers such as SLR(1), LALR(1) and LR(1) use one token of lookahead to determine for each complete item whether the reduction will lead to a parse state that can shift the given token.


Parse Tables

If the states are constructed ahead of time in parse tables, then step 1 consists of pushing state 0 on the stack and step 3 consists of using the current state and symbol to look up the new state from the table. A parse table will also determine the action for step 2 by using lookahead or arbitrarily choosing one action over others. For example, bison-generated parsers choose shift actions over reduce actions and an earlier rule over a later rule. This breaking of conflicts is undesirable.



John Tang Boyland
2004-02-18