Chapter 1. Introduction

Figure 1.1 has very little meaning to you when first encountered.  At
this point: it is merely pointing forward to pieces of the text which
will follow.  It does effectively make the point that a compiler has
some number of "phases" or modules, each of which does some
processing, and each of which produces one or more data structures at
the interface to the next phase.  This material must be a kind of
representation of the program.  Not all the modules named by Appel are
necessary, and there may be other divisions or orderings of phases
used by others compilers.

We already understand the top-left representation, which is the SOURCE
PROGRAM, and we understand (from CS61C) the bottom right MACHINE
LANGUAGE.

One major theme of this course, and the one which involves you in
programming, is the succession of transitions which move from SOURCE
to MACHINE LANGUAGE.

Observe (from CS61A) that you already know another way of implementing
a programming language. Probably you wrote pieces of Logo in an interpretive
setting.  You read in Logo language and, compressing or skipping all
those phases in figure 1.1, you just executed the program!
If you did not work on a Logo evaluator you certainly looked at
the Metacircular Evaluator.  Now imagine that you could write
an evaluator for Java in Scheme. Then it would not be "metacircular"
at all. It might be less confusing too.

You also know at least two other ways of implementing a programming 
language P.

  1. Make sure that the language P is already implemented on the
machines of interest.  (e.g. it is C)

  2. Write a program to take any program in P and and translate it,
statement by statement, into a program in the C language, thereby
reducing the problem to the previous solution.

These are not entirely facetious suggestions.  In fact they are far
more prevalent (especially solution 1) than the build-your-own
compiler kind of solution, as we discuss in this course.

In table 1.2, not much of what is stated there is going to make sense
to you at this point, because it too is forward-looking.

Moving to section 1.3, note that you have already encountered tree
representations of expressions, and that these are very conveniently
read/written in Lisp.

I think the explanation, bottom of page 8, is not very clear. Using
different classes for nodes in a tree seems to me unnatural. Just
use a single class in which the head of the node is recorded.
e.g. operator=minus  or operator=AssignStm.   But there is
an equivalence to what the authors suggest. More about this later..

Consider the syntax of this straight-line programming language (It is
called straight-line because using this language one cannot specify
loops or subroutines, so execution proceeds in a straight line from
the top to the bottom.)

There are STM, EXP, EXPLIST, BINOP categories to be described (later
we will know these are "nonterminal symbols" of the grammar). You
probably encountered this notion of a grammar in CS61a.  We will use
this extensively.

The ->  can be thought of as a "could be" from a recognition perspective. 
From a language-generation viewpoint, you can read it as "rewrites as".

Thus according to the first rule in grammar 1.3 on page 7, any place
you see STM (statement) it could be replaced by STM; STM.  That is (a
different) statement followed by a semicolon (;) followed by a
statement.  The second rule says that any place you see STM you could
have id := EXP where id is some identifier, := is the literal
character pair, and EXP stands for (expression), something defined in
rules 4-7.

  This kind of expansion into more meaningful text explains the
syntax.

Now to describe the "informal semantics", we associate with each of
the rules in Grammar 1.3 some meaning, like the binary operator +
means "addition of the pieces to the left and right" so that 3+5 means
8.

Are there any other possible semantics? Sure.  We could
assert that + means "string concatenation" as it does in Java.
Or we could make up any other meaning at all.  Like 3+5
means "draw a circle on the screen centered at location (3,5). 

We are going to have to be more formal later on.


Now how about representing the example program in the
computer:

a :=5+3; b:= (print(a, a-1), 10*a); print(b).

before we go further, note that parentheses are used in two different
ways: to enclose arguments to print, and to group statements.  A
slightly different grammar might be clearer to some people; one which
produces a program like this:

a :=5+3; b:= begin print[a, a-1], 10*a end; print[b].

Some people make a big deal of such syntactic "sugar" or arbitrary
details. Lisp has very little syntactic sugar.

Back to representing this.

A Scheme fan would probably do this:

(begin
      (set! a (+ 5 3))
      (set! b (begin (print a (- a 1)) (* 10 a)))
      (print b))

where we have taken the liberty of replacing := with the Scheme
function set!.

A more long-winded way of representing this in Lisp would reflect the
fact that the whole example is a compound statement of two statements
the first of which is an AssignStm whose Exp is a Exp binop Exp etc.

It would look like

(CompoundStm
 (Stm
  (AssignStm
   (id a)
   :=
   (Exp
    (Exp
     (num 5))
    +
    (Exp
     (num 3)))))

 (Stm
  (CompoundStm
   (AssignStm
    (id b)
    :=
    ....))))


The "natural" way of representing such a tree in Java seems unnatural
to me, but it is expressed in Program 1.5 on page 10.  In this case
lots of classes must be made up, where each non-terminal is first
declared as a public abstract class, say Exp, and where each of the
particular rules with Exp on the left is encoded in a class that
extends Exp. This is fairly useless without a great deal of extra
programming. For example you can't create or print out such a tree
without a great deal of additional effort.

The program for chapter 1 is a straight-line program interpreter,
which, with the proper representation in Scheme is trivial: if P is a
straight-line program above, (eval P) does the job.

Note that the mess that is being discussed on page 13
is a table-lookup routine, not necessary in lisp or scheme. Why?

First, let's explain what is going on.  If you have an identifier
whose printed representation is the string "a", how do you make it
possible to set it to 8?  Here's how: setting "a" to 8 means putting
in a hash table or an association list or some other data structure the
pair ["a", 8].  Later if someone asks you for the value of the
identifier "a", you can look it up in the table and say the value is
8.  Similarly, you can change the value in the table.

It is not necessary to do this in Scheme or other Lisp dialects
because Scheme does this routinely: when it reads the identifier
string "a" it looks it up in the Scheme symbol table. If it does not
yet exist there, a new symbol is created (this is called interning,
short for internalizing, I guess).
The symbol we refer to as a has associated with it the string "a"
which helps us to print it out.  It also has associated with it some
value, like 8, which helps us in case someone wants to do arithmetic
with it, or print out the value. It might also have associated with
it other properties, including what bucket it is in in a hash table,
etc.

Thought Exercise: 
In Lisp, (setf x 8) creates a new symbol x and stores 8 in 
a place known as (symbol-value 'x) 

Write new programs mysetf and mysymbol-value that do
not work with the built-in symbol table, but create a new
one like the table in AWA.  It should work this way:

(mysetf "x" 8)
(mysetf "y" 4)
(mysetf "plus" (symbol-function '+)) ;grab addition function

(funcall (mysymbol-value "plus") ;apply + to values of x and y
    (mysymbol-value "x")(mysymbol-value "y"))  gives 12

;; in scheme, this would work without funcall, but common lisp doesn't like
;; to evaluate the car of a form to a value which
;; happens to be a function. (subtle point)

((mysymbol-value "plus") ;apply + to values of x and y
    (mysymbol-value "x")(mysymbol-value "y"))

;;;; one solution
(setf st (make-hash-table :test #'equal))
(defun mysetf(a b)(setf (gethash a st) b))
(defun mysymbol-value(a) (gethash a st))

;; note, many symbol tables could be present at the same time.
;; we make just one, global one.