Section notes: Week 7

CS 164, Fall 2005

General

HW for today: RPN with swap (include augments)
HW next time: last question from Fall 2002 midterm 1
Question from last time: negation and subtraction
Questions about assignment 3?

LR(0) parsing

What is the LR(0) state machine for this grammar?

  S' -> S $
  S  -> S S +
  S  -> int

Ambiguities and conflicts

This material is summarized from Chapter 8 of lex & yacc (Levine).

Conflicts can occur because of grammatical ambiguities or just because not every CFG is LALR(1). Part of the reason for understanding the LALR parsing algorithm is so that you can read the output of a parser generator and understand how the comments came about. The function print-lalr-states in lalr.cl is useful for this: look through the states, and see where there are. For example,

CL-USER> (def-jyacc foo (S (S + S)) (S (int)))
Warning, Not LALR(1)!!: STATE-3, S --> S + S 
CL-USER> (print-lalr-states)

; ... skip some states ...

STATE-3:
  S --> S . + S, *EOF* + 
  S --> S + S ., + *EOF* 
    On + shift STATE-2
    On + *EOF* reduce S + S --> S

This is a shift-reduce conflict: on +, we can't decide whether to shift + onto the stack, or perform a reduction. The other type of conflict is a reduce-reduce conflict. When shift-reduce or reduce-reduce conflicts occur, a disambiguation rule is invoked, like use the earlier production, or prefer shift to reduce. Ideally, you want not to have these conflicts; even if you disambiguate, you run the danger of hiding some other problem. But sometimes it's convenient to let the parser generator handle them. In this case, we'd want extra declarations to tell the parser generator what option to choose: these declarations usually specify precedence and associativity.

Common examples of conflicts include:

Expression grammars
```
    E -> E + E
    E -> E * E
    E -> int
  
```
This will give you a few shift-reduce conflicts. They can be removed by specifying the associativity and precedence of the operators... or by writing an unambiguous grammar, as we've been doing.

if-then-else

    stmt -> if id then id
    stmt -> if id then id else id

The problem: how do you decide you want the latter?

      (if id then (if id then id) else id)
      (if id then (if id then id else id))

Can assign precedence (choose the second production on conflict), or rewrite to be unambiguous.

Nested loops

    S -> S T | T
    T -> T P | P
    P -> int

Limited lookahead
```
    S   -> id opt [ id ]
    opt -> [ keyword ]
    opt -> 
  
```
Rewrite the grammar -- and if it gets too burdensome, punt and post-process the parse tree.

EBNF and Lisp macros

The BNF accepted by Johnson's YACC is limited. Wouldn't it be nice if we could write productions like these?

    (class-head (class id [ extends id ]?))
    (stmts      ([ stmt ]*))
    (E          (E + Tm or E - Tm))

How might we do this?