This page includes questions by the instructor (John Boyland) At least one of the (appropriate) questions will appear on the first midterm. The actual midterm will have four questions, although some may consist of a set of short-answer/review questions.

Any of the review questions from Chapters 1, 2 or 4 may appear on the midterm.

- If one is designing and implementing a new language, what steps are needed?
- What is the difference between a phase of a compiler and a pass of a compiler?
- Briefly describe why one may wish to split a compile into modules corresponding to the phases.
- Give advantages and disadvantages to using tools such as
**flex**and**bison**to implement parts of a compiler. - Describe what the ``front-end'' and ``back-end'' of a compiler are. Why are they distinguished?
- What is a
*checked dynamic semantic error*? - What is
*maximal munch*? When does it come into play? What should you do about it? - Scanners need to return a series of tokens, whereas a DFA just returns Y or N. How does one implement a scanner using a DFA? What problems can you foresee in the implementation?
- What is the difference between a
*deterministic*and a*nondeterministic*automaton ? - Why is it important that the scanner not frequently look at characters multiple times ?
- List four ways in which a lexical error can be handled. What method does the Cool compiler use ?
- What advantages do context-free grammars have over regular expressions for describing programming languages.
- What is a
*derivation*? - What is a
*sentential form*? - What is the
*yield*of a parse tree ? - Compare and contrast top-down and bottom-up parsing in terms of complexity, class of grammars accepted, contents of the parse stack, handling of tokens and type of derivation obtained.
- What are the advantages of bottom-up parsing over top-down parsing ?
- What are FIRST and FOLLOW sets used for in SLL parsing? Which set is used for SLR parsing?
- The simplest way to handle an error is to print a message and then stop. Why should one do more? Describe at least threee different ways to handle errors, pointing out advantages and disadvantages.
- What is a
*shift/reduce*conflict? a*reduce/reduce*conflict? - What is an ambiguous grammar? Give an example.
- What connection is there between
*ambiguous grammars*,*inherently ambiguous languages*and parsing conflicts? - Why don't we ever get shift/shift conflicts in LR parsing ?
- What is the difference between a
*teminator*and a*separator*in a list of items? - What are the differences between context-free grammars and tree grammars?
- What differences does bottom-up versus top-down parsing make for building an abstract syntax tree?
- In a one-pass compiler: how does the model of attribute grammar (S-, versus L-attributed) interact with kind of parsing (LL versus LR)? What role does an AST play?

Construct a regular expression that matches the language of all strings of ones and zeroes which contain exactly 3 zeroes.

Give the NFA construction for the following regular
expression. Then find a DFA for the same
language: `(0*1+)*0*`

Same as above for `(aa|bb)*`.

Construct an automaton for the following regular expression for Ada integers. A nondeterministic automaton is OK.

[0-9](_?[0-9])*(#[0-9A-Fa-f](_?[0-9A-Fa-f])*#)?

Give a regular expression for the following automata
where the set of states is
{`A`,`O`};
`A` is the start state and `O` is the only final state.
The transitions are written as triples:

(The automaton can go fromfrom,chars,to)

- First Automaton
(A,0,A) (A,1,O)

- Second Automaton (where "e" means epsilon)
(A,0,A) (A,e,O) (O,e,A) (O,1,O)

Given the following grammar

a -> t + t t -> f * f | f f -> ID | INT_CONSTconstruct a parse tree for

Given the following grammar, give the LR(0) parse tables (the characteristic finite-state machine (CFSM)):

S -> E E -> E '+' E E -> 'a'

List all the LR items associated with the following grammar. Then construct the CFSM.

S -> expr expr -> expr '*' primary | primary primary -> '7'

It has been proposed to add floating point numbers to
Cool. The proposed form for floating point constants
is that a single dot occurs anywhere within a non-empty string of digits
(including at beginning or end) and that immediately following
*may* appear an exponent of the following form:
an ```e`'' either upper or lower case, then an optional
plus or minus sign and then a nonempty string of digits.
Write a regular expression suitable for `flex`
that matches exactly this set of floating-point constants.
You may use `flex` definitions.

It has been proposed to making the `else` part of
a conditional optional for Cool. Write the modified context-free
grammar production(s). Do we get the ``dangling else''
problem? Explain!

Write the extra productions to the syntax that would be required to use a nicer syntax for arrays in Cool. Make sure your syntax can handle expressions such as

a[1] := a[i] + 1What new tree node constructors would we need? Give their C prototypes, and give the

"/*" [*]* [^*]* [*]* "*/"but it didn't match the following comment:

/* Here Y=2*x+1; */and it ate up this entire line as a single comment:

/**/ y = "Hello, world\n"; /**/

Explain why the first comment isn't matched and the second ``comment'' was matched. Then fix the problem.

Some languages (such as Cool) have line comments; others (such as C) use bracketed comments (see above); C++ has both kinds. Compare and contrast the benefits and problems of each kind. In your answer touch on the following issues:

- The various purposes of comments.
- Whether it is clear what things are comments and what things are not.
- Whether usage is likely to lead to errors.
- How easy they are to implement.

Does there exist a regular expression which describe non-empty strings containing the same number of a's and b's and no other characters? What about a context-free grammar ?

Describe the following DFA in both plain (short!) English and as a regular expression. There are four states {A,B,C,D}; A is the start state and D is the only final state:

(A,1,A) (A,0,B) (B,1,C) (B,0,B) (C,0,D) (C,1,A) (D,0,B) (D,1,C)

Given the following grammar

isexpr -> term | expr '+' term term -> factor | term '*' factor factor -> INT | '(' expr ')'

In language Drool, we have two operators: the dispatch operator
`.`, and the assignment operator `=`,
which is right-associative. The former has a higher precedence than the
latter.
Give a non-ambiguous CFG so that expressions such as the following are
parsed correctly:

z = x = obj1.f1().f2()

Given the following context-free grammar

stmt : if_then | if_then_else | other_stmts if_then : IF condition THEN stmt if_then_else : IF condition THEN stmt ELSE stmtYou may assume that nonterminals

- Show that this context-free grammar is ambiguous.
- Rewrite the grammar so that it is not ambiguous,
*but still accepts the same language*

Prove that the following grammar is ambiguous. Write a grammar that accepts the same language that is not ambiguous. Is the language inherently ambiguous? Is it regular?

S -> z S | Z Z -> | Zz

Is it LL(1) ? LL(k) for any k ? If so, give predict sets. If not, how can it be corrected to be LL(k)? LL(1)?stmtlist -> stmtlist stmt | stmt stmt -> noun iverb adv | noun tverb noun noun -> COMPILERS | I | YOU iverb -> SLEEP | WORK adv -> HARD | WELL tverb -> LOVE | HATE | SEE

Given the context-free gramar

expr1 : expr2 ':' | expr1 ';' ; expr2 : expr2 '%' ID | ID ;Is this grammar LL(1), LR(1).

Given the following grammar fragment for Cool:

expr : ... | IF '( expr ')' expr ELSE expr | expr '+' expr | ... ;When the Cool grammar is processed by

state 146 expr -> IF '(' expr ')' expr ELSE expr . (rule 35) expr -> expr . '+' expr '+' shift, and go to state 92 '+' [reduce using rule 35 (expr)] $default reduce using rule 35 (expr)

Give a sample Cool expression that will lead the parser into such a state when parsing. In this case, what is the correct action for the parser to take?

Describe the ``dangling else'' problem of Pascal and C/C++. Is the problem we see here an instance of the ``dangling else'' problem?

If we don't correct the problem, how will `bison`
resolve the conflict? Will this achieve the correct result for
all inputs?

How can the problem be fixed if we do not rely on `bison`'s
default conflict resolution?

Suppose we wish to write an attribute grammar to do constant
evaluation on Cool ASTs for the purpose of replacing expressions with
literals at compile time. A design of such an attribute grammar with
two synthesized attributes `isc` (true if the expression is
a constant boolean or integer) and `val` (the value if it was
constant, where 0 = false and 1=true for boolean expressions).
Write the rules for the following COOL AST nodes:

- int_const
- bool_const
- string_const
- null
- lt
- add
- mul
- div
- cond

if 3 < 2 then x*(65536/0) else 10*(65536/2) fi