Section notes: Week 7

CS 164, Fall 2005

General

LR(0) parsing

What is the LR(0) state machine for this grammar?

  S' -> S $
  S  -> S S +
  S  -> int

Ambiguities and conflicts

This material is summarized from Chapter 8 of lex & yacc (Levine).

Conflicts can occur because of grammatical ambiguities or just because not every CFG is LALR(1). Part of the reason for understanding the LALR parsing algorithm is so that you can read the output of a parser generator and understand how the comments came about. The function print-lalr-states in lalr.cl is useful for this: look through the states, and see where there are. For example,

CL-USER> (def-jyacc foo (S (S + S)) (S (int)))
Warning, Not LALR(1)!!: STATE-3, S --> S + S 
CL-USER> (print-lalr-states)

; ... skip some states ...

STATE-3:
  S --> S . + S, *EOF* + 
  S --> S + S ., + *EOF* 
    On + shift STATE-2
    On + *EOF* reduce S + S --> S

This is a shift-reduce conflict: on +, we can't decide whether to shift + onto the stack, or perform a reduction. The other type of conflict is a reduce-reduce conflict. When shift-reduce or reduce-reduce conflicts occur, a disambiguation rule is invoked, like use the earlier production, or prefer shift to reduce. Ideally, you want not to have these conflicts; even if you disambiguate, you run the danger of hiding some other problem. But sometimes it's convenient to let the parser generator handle them. In this case, we'd want extra declarations to tell the parser generator what option to choose: these declarations usually specify precedence and associativity.

Common examples of conflicts include:

EBNF and Lisp macros

The BNF accepted by Johnson's YACC is limited. Wouldn't it be nice if we could write productions like these?

    (class-head (class id [ extends id ]?))
    (stmts      ([ stmt ]*))
    (E          (E + Tm or E - Tm))
  
How might we do this?