Homework # 2
due 9/28/04

This homework will have you modifying a Bison parser to implement the substring parsing algorithm by Bates and Lavie. The substring parser will be started at the first parse error and will continue until the end of the (Cool) file. You will use the CompSci 654 Cool 2004 parser. Do not distribute this (solution!) parser (or any of the other solution files) to anyone outside the class.

Bison Parsers

Bison produces table-driven LALR(1) parsers in an output file that has three logical parts:

  1. Code from cool.y. This includes the code in the %{ ...%} sections as well as the actions. The former code occurs at the beginning of the output file, whereas the action code occurs in a large switch statement near the end of the file.
  2. Parse tables constructed from the grammar. These tables are arrays indexed by several different ranges:
    yytranslate
    this takes a terminal symbol (either a single character or the #define'd values used for keywords and multiple character tokens) and maps it into a symbol number. The symbol numbers are:
    0
    EOF
    1
    error
    2
    undefined (The lexer should never return such a token.)
    YYNTBASE
    (The first non-terminal.)
    YYNTBASE+YYNNTS
    (The first illegal symbol number.)
    yyprhs
    This table indirects a rule number (that is a grammar production) into a right hand side index.
    yyrhs
    Given a right hand side index $r$, this array provides the symbols on the right-hand side for the rule. yyrhs+$r$ is a zero-terminated array (string) of symbol numbers.
    yyrline
    This is used for debugging and indicates the line of rule in the grammar (it has nothing to do with line numbers of the file being parsed). You can ignore it.
    yytname
    This gives the human-readable name for a symbol index. You'll want to use this to debug your parser.
    yyr1
    This gives the nonterminal (a symbol index) that a rule belongs to (given a rule index).
    yyr2
    This gives the length of a rule given a rule index.
    yydefact
    This array is indexed by state number (between 0 and YYLAST inclusive) and gives the default action for this state. The action is either 0 (error) or a rule number for a reduction.
    yydefgoto
    The table is indexed by a nonterminal index (the symbol index minus YYNTBASE). This nonterminal is the one resulting from a reduction. It gives the new state we go to after a reduction.
    yypact
    The table yypact is indexed by state index and tells the parser what to do in this state. If the value is YYFLAG, then the input is irrelevant, and we use the yydefact table (see above). Otherwise, the value is added to the input token (after it is translated into a symbol index) and used to index into yytable which is essentially a two-d array where rows are squashed on top of each other for compression. To make sure we really have a good entry, the index needs to pass three checks:
    1. The addition must not be negative.
    2. The addition must not be bigger than the table.
    3. The table yycheck must have the same input symbol as we were just looking at.
    If any of these checks fail, the input token is ignored, we use the default action. Otherwise, the entry in the table yyn gives the parse action:
    Negative
    reduce: -yyn is rule number.
    Positive
    shift: yyn is new state.
    0 or YYFLAG
    error.
    yypgoto
    This array is indexed by nonterminal index. The nonterminal is the one resulting from the reduction. The result is added to the uncovered state (the one to 'goto' from) to get an index into yytable. If the result passes the checks above (except that we check that yycheck of the index is the state), the value in the table is the new state. Otherwise, we use the default goto table (see above).
    yytable
    This is two overlaid compressed two-d arrays giving parse actions and goto entries.
    yycheck
    Check for compressed tables (see descriptions of yypact and yypgoto above).
  3. The table driver, copied from bison.simple. This table driver is commented and includes code to use the tables as indicated above. You should read the main parsing engine starting at the comment:
    /* yyresume: */

Substring Parsing

Read the paper by Bates and Lavie. Implementing the algorithm will require new data structures as well as code that uses them and the tables described above. You will need to declare structures to holds nodes, root lists and state sets. Use the simplest structures you can think of. You may use the STL if you like. Do not attempt to optimize.

Substring parsing is an error-recovery technique. It should not be used unless a parse error happens. Right after the point where an error message is printed, at yyerrlab1, call your substring parser which should start parsing with the token that caused the error. Continue parsing (without invoking any actions). Whenever an error occurs, start another parse with the token that caused the previous parse to fail. When this is done, call YYABORT.

As an exception, if a token is read that does not occur anywhere in the grammar, then this token should be skipped before starting the substring parsing. The detection of this condition should be done in the startup-phase of the substring parser. When it happens, a non-empty string of such illegal tokens should be collected and accepted as a special-case substring with a special error message before normal substring parsing continues.

Your code will need to be inserted in bison.substring, copied from the normal bison.simple. The first task your substring parser will do is build the long goto table. Then it can start the process.

What you need to do

Get set up by running


make -f /afs/cs.uwm.edu/users/classes/cs754/src/homework2/Makefile
This command will copy in everything you need. It will create a file bison.substring that is a copy of the standard bison.simple. You will need to edit it to add the substring parser. Then make parsetest to get an executable. Test it on the examples that will be installed in
/afs/cs.uwm.edu/users/classes/cs754/src/homework2/.
Report how big the forest stack gets: specifically
  1. maximum depth of any node in the FSS;
  2. maximum number of roots in the FSS;
  3. maximum number of nodes in the FSS.

Non-Solaris Architectures

The .a libraries include the following relevant objects:

If you are not working on Solaris (SPARC or INTEL), you will need to compile these yourself, and create your own libraries. I provide a makefile target my-own-libs that does this. It assumes the ar command takes -cr flags. You may need to modify it for your brand of Unix.

What you need to turn in

You should turn in the report in hard-copy, and leave the files (especially bison.substring) in your AFS volume.



John Tang Boyland
2004-09-14