CS 654: Introduction to Compilers Spring 2006

Programming Assignment 4
Due Thursday, March 30

Introduction

In this assignment you will implement part of the static semantics of Cool. You will use the abstract syntax tree (AST) built by your parser from assignment 3 to check that a program is in conformance with the Cool specification. In this part of the static semantics, we will only check method bodies and will not handle the object-oriented features of Cool.

This assignment has much more room for design decisions than previous assignments. Your program is correct if it checks programs against the specification. There is no single ``right'' way to do the assignment, but there are many wrong ways. There are a number of standard practices which we think make life easier and we will try to convey them to you. We will suggest certain program structures and if you learn how to use them, they will make your task easier. Whatever you decide to do, be prepared to justify your solution.

You will need to refer to the typing rules, identifier scoping rules, and other restrictions of Cool as defined in the CoolAid. You will also need to add methods and data members to the AST class definitions for this phase. The functions that the tree package provides are documented in the handout for programming assignment 3 and in the header files cool-tree.h and tree.h. You can modify the class definitions by changing (adding things to) cool-tree.h.

Files and Directories

To get the assignment type


make -f /afs/cs.uwm.edu/users/classes/cs654/assignments/PA4/Makefile
in your PA4 directory in afs. This command copies a number of files to your directory, some of them with read-only permission. As usual, you should not modify files that are read-only. Please read and follow the directions in the README file.

The files that you will need to modify are:

As usual, there are other files used in the assignment that are symbolically linked to your directory or are included from /afs/cs.uwm.edu/users/classes/cs654/include/PA4. You should not modify these files. Almost all of these files have been described in previous assignments. The exceptions are ast.flex and ast.y, which implement lexical analysis and parsing for a textual representation of Cool ASTs. Recall that there are two versions of coolc: a normal executable and a shell script that glues separate parser, semantic analysis, and code generation phases together via pipes. Each phase uses dumptype to print the AST to the pipe, and the next phase uses the AST parser to reconstruct the tree data structure.1

All software supplied with this assignment is supported on both linux and sparc machines. Remember to run make clean if you switch architectures.

Testing the Semantic Analyzer

You will need a working scanner and parser to test your semantic analyzer. There is a script mysemant1 that combines a scanner, parser, and a partial semantic analyzer (the parts you do not do) with your partial semantic analyzer. See the README file for instructions on how to use either your own components or the components from coolc. It is wise to test your semantic analyzer with the coolc scanner and parser at least once, because we will grade your semantic analyzer using coolc's scanner and parser.

We have provided you with a -s flag for debugging you semantic analyzer; see the README file for instructions on how to use this flag.

Tree Traversal

As a result of assignment 3, your parser builds abstract syntax trees. The file dumptype.cc illustrates how to traverse the AST and gather information from it. This algorithmic style--a recursive traversal of a complex tree structure--is very important, because it is a very natural way to structure many computations on ASTs.

Your programming task for this assignment is to 1) traverse the tree, 2) manage various pieces of information that you glean from the tree, and 3) use that information to enforce the semantics of Cool. One traversal of the AST is called a ``pass''. It is possible to do this part of the assignment in a single pass over the tree.

If you need new data members and member functions, add them directly in the cool-tree.h file.

Naming and Scoping

A major portion of any semantic checker is the management of names. The specific problem is determining which declaration is in effect for each use of an identifier, especially when names can be reused. For example, if a let expression declares the variable i twice, then wherever i is referenced the semantics of the language specify which declaration is in effect. It is the job of the semantic checker to keep track of which declaration a name refers to.

As discussed in class, a symbol table is usually used to manage names. We provide a module to manage symbols and you are free to do with it as you like. Our module allows you to enter, exit and augment scopes as needed. You will need multiple symbol tables and multiple kinds of symbol tables. Our solution uses four kinds of symbol tables: classes, methods, attributes and objects.

The semant.h file we start you with includes partial definitions of two classes: ClassTable and Environment. The first holds the class namespace (which is global and flat). Objects of the Environment class are intended to hold the object namespace plus has access to the other namespaces of the referencing environment indirectly: an environment has a pointer to the class table. Methods and attributes need to be stored in tables too, but we will defer that part until the next assignment.

Besides the identifier self, which is implicitly bound in every class, there are four ways that an object name can be introduced in Cool:

It is, of course, an error to use any name that has no matching declaration.

Type names are used in six different kinds of places in the AST:

For all but the first, you must set the attribute (see Table 1) that stores the AST for this named type. You will need to look up the name in the class table. Native attributes will use NULL for this attribute.

Remember that neither classes, methods, nor attributes need be declared before use. Think about how this affects your analysis.

Type Checking

Type checking is another major function of the semantic analyzer. The semantic analyzer must check that valid types are declared where required. For example, the return types of methods must be declared. Using this information, the semantic analyzer must also verify that every expression has a valid type according to the type rules. The type rules are discussed in detail in the CoolAid and in lecture.

One difficult issue is what to do if an expression doesn't have a valid type according to the rules. First, an error message should be printed with the line number and a description of what went wrong. It is relatively easy to give informative error messages in the semantic analysis phase, because it is generally obvious what the error is. We expect you to give informative error messages. Second, the semantic analyzer should attempt to recover and continue. A good semantic analyzer will avoid cascading errors using any of several standard techniques. We do expect your semantic analyzer to recover, and it should also make some attempt to avoid cascading error messages. One technique is to assign a special type (compatible with all types) to any expression that cannot otherwise be given a type (we used this method in coolc).

Code Generator Interface

The purpose of the semantic analyzer is not only to check for errors but also to decorate the abstract syntax tree. All the attributes are defined in cool-tree.h as private fields. The following table shows what attributes the code generator needs. The ones marked with a footnote are already done for you. The remaining ones must be assigned (with a minor exception concerning binding).

All of these attributes are pointers of one sort or another. Many of them are pointers to the AST nodes referred to by name in the node. The ``bindings'' are pointers to VarBinding objects which describe the different kinds of variable bindings. You need to compute the bindings of those uses that do not refer to attributes. The latter ones are already assigned for you. So before doing name resolution for identifiers, check to see if the binding is already non-null in which case you should leave it unchanged. The type of every expression is a symbol from the ``id'' symbol table. The type of no_expr nodes should be the symbol _no_type, which can be accessed using global variable No_type. The type of null nodes should be the symbol _void, which can be accessed using global variable Void.

constructor or phylum attribute meaning
program obj_class AST for class Object
  int_class AST for class Integer
  str_class AST for class String
  bool_class AST for class Boolean
class super$^2$ AST for super class (or NULL for Object)
method overrides$^2$ AST of method being overridden (or NULL)
  return_of_class AST for class of return type
attr of_class AST for class of type
formal of_class AST for class of type
branch of_class AST for class of type of branch variable
  type symbol for type of value of expression in branch
Expression type symbol for type of value of expression (or NoType)
let of_class AST for class of type of declare variable
object binding description of variable being used
assign binding description of variable being assigned
dispatch method$^2$ AST for (statically selected) method
static_dispatch method$^2$ AST for method selected
new_ of_class AST for class being instantiated.

Output

For incorrect programs, the output of semantic analysis is error messages. You are expected to recover from all errors that this part of the semantic analyzer finds. You do not need to test your program on programs with inheritance-related errors. You are expected to produce complete and informative error messages. Assuming the inheritance is used correctly, the semantic checker should catch and report all semantic errors in the program. When in doubt, use coolc as a guide in determining what informative error messages should say. Your error messages need not be identical to those of coolc, but making them identical will make it easier for you to see if you have correctly implemented tests (and will make it easier to grade your project).

We have supplied you with a simple error reporting method semant_error. This routine takes a filename and the AST node where the error occurred, and it returns the error stream after it has printed an error header. The filename should be the file in which the error occurs. The Class_ nodes store the file in which the class was defined (recall that class definitions cannot be split across files). In an error message, the line number of the error message is obtained from the AST node where the error is detected and the file name is obtained from the enclosing class. We have provided you with a version of semant_error in class Environment which you should find useful. We have also provided you with semant_warning to make it easy to generate warning messages that do not prevent compilation. You are not required to implement any warnings.

For correct programs, the output is a decorated abstract syntax tree. You will be graded on whether your semantic phase correctly annotates ASTs with types and on whether your semantic phase works correctly with the coolc code generator. You may wish to compare your results with the solution:


grid.cs: coolc -Pps xxx.cl > xxx.sol-out
grid.cs: mysemant1 xxx.cl > xxx.out
grid.cs: diff xxx.sol-out xxx.out

You are also expected to program in good, structured style. You should spend some time thinking about the class definitions you will use.

Summary

In summary, your partial semantic analyzer should find all errors in method bodies except for method lookup errors, and also find errors in attribute types and method return types. You must set the following attributes: the ones named something_class, binding and type, as described in the table.

Remarks

The semantic analysis phase is by far the largest component of the compiler so far. Even divided into two parts, it is easy to get lost when doing it. Ask yourself:

If you can answer these questions for an aspect of Cool, implementing a solution should be straightforward.



Footnotes

... structure.1
One may wonder: Why have two versions of coolc? The ``phased'' version of coolc greatly simplifies the structure of the programming assignments by removing the need to guarantee that your code for one phase link with all components for other phases of the course compiler.
...
\(^2\)These attributes are already computed for you in this assignment. You may need to use them to do the work.


John Tang Boyland
2006-03-15