Chapter 1. Introduction Figure 1.1 has very little meaning to you when first encountered. At this point: it is merely pointing forward to pieces of the text which will follow. It does effectively make the point that a compiler has some number of "phases" or modules, each of which does some processing, and each of which produces one or more data structures at the interface to the next phase. This material must be a kind of representation of the program. Not all the modules named by Appel are necessary, and there may be other divisions or orderings of phases used by others compilers. We already understand the top-left representation, which is the SOURCE PROGRAM, and we understand (from CS61C) the bottom right MACHINE LANGUAGE. One major theme of this course, and the one which involves you in programming, is the succession of transitions which move from SOURCE to MACHINE LANGUAGE. Observe (from CS61A) that you already know another way of implementing a programming language. Probably you wrote pieces of Logo in an interpretive setting. You read in Logo language and, compressing or skipping all those phases in figure 1.1, you just executed the program! If you did not work on a Logo evaluator you certainly looked at the Metacircular Evaluator. Now imagine that you could write an evaluator for Java in Scheme. Then it would not be "metacircular" at all. It might be less confusing too. You also know at least two other ways of implementing a programming language P. 1. Make sure that the language P is already implemented on the machines of interest. (e.g. it is C) 2. Write a program to take any program in P and and translate it, statement by statement, into a program in the C language, thereby reducing the problem to the previous solution. These are not entirely facetious suggestions. In fact they are far more prevalent (especially solution 1) than the build-your-own compiler kind of solution, as we discuss in this course. In table 1.2, not much of what is stated there is going to make sense to you at this point, because it too is forward-looking. Moving to section 1.3, note that you have already encountered tree representations of expressions, and that these are very conveniently read/written in Lisp. I think the explanation, bottom of page 8, is not very clear. Using different classes for nodes in a tree seems to me unnatural. Just use a single class in which the head of the node is recorded. e.g. operator=minus or operator=AssignStm. But there is an equivalence to what the authors suggest. More about this later.. Consider the syntax of this straight-line programming language (It is called straight-line because using this language one cannot specify loops or subroutines, so execution proceeds in a straight line from the top to the bottom.) There are STM, EXP, EXPLIST, BINOP categories to be described (later we will know these are "nonterminal symbols" of the grammar). You probably encountered this notion of a grammar in CS61a. We will use this extensively. The -> can be thought of as a "could be" from a recognition perspective. From a language-generation viewpoint, you can read it as "rewrites as". Thus according to the first rule in grammar 1.3 on page 7, any place you see STM (statement) it could be replaced by STM; STM. That is (a different) statement followed by a semicolon (;) followed by a statement. The second rule says that any place you see STM you could have id := EXP where id is some identifier, := is the literal character pair, and EXP stands for (expression), something defined in rules 4-7. This kind of expansion into more meaningful text explains the syntax. Now to describe the "informal semantics", we associate with each of the rules in Grammar 1.3 some meaning, like the binary operator + means "addition of the pieces to the left and right" so that 3+5 means 8. Are there any other possible semantics? Sure. We could assert that + means "string concatenation" as it does in Java. Or we could make up any other meaning at all. Like 3+5 means "draw a circle on the screen centered at location (3,5). We are going to have to be more formal later on. Now how about representing the example program in the computer: a :=5+3; b:= (print(a, a-1), 10*a); print(b). before we go further, note that parentheses are used in two different ways: to enclose arguments to print, and to group statements. A slightly different grammar might be clearer to some people; one which produces a program like this: a :=5+3; b:= begin print[a, a-1], 10*a end; print[b]. Some people make a big deal of such syntactic "sugar" or arbitrary details. Lisp has very little syntactic sugar. Back to representing this. A Scheme fan would probably do this: (begin (set! a (+ 5 3)) (set! b (begin (print a (- a 1)) (* 10 a))) (print b)) where we have taken the liberty of replacing := with the Scheme function set!. A more long-winded way of representing this in Lisp would reflect the fact that the whole example is a compound statement of two statements the first of which is an AssignStm whose Exp is a Exp binop Exp etc. It would look like (CompoundStm (Stm (AssignStm (id a) := (Exp (Exp (num 5)) + (Exp (num 3))))) (Stm (CompoundStm (AssignStm (id b) := ....)))) The "natural" way of representing such a tree in Java seems unnatural to me, but it is expressed in Program 1.5 on page 10. In this case lots of classes must be made up, where each non-terminal is first declared as a public abstract class, say Exp, and where each of the particular rules with Exp on the left is encoded in a class that extends Exp. This is fairly useless without a great deal of extra programming. For example you can't create or print out such a tree without a great deal of additional effort. The program for chapter 1 is a straight-line program interpreter, which, with the proper representation in Scheme is trivial: if P is a straight-line program above, (eval P) does the job. Note that the mess that is being discussed on page 13 is a table-lookup routine, not necessary in lisp or scheme. Why? First, let's explain what is going on. If you have an identifier whose printed representation is the string "a", how do you make it possible to set it to 8? Here's how: setting "a" to 8 means putting in a hash table or an association list or some other data structure the pair ["a", 8]. Later if someone asks you for the value of the identifier "a", you can look it up in the table and say the value is 8. Similarly, you can change the value in the table. It is not necessary to do this in Scheme or other Lisp dialects because Scheme does this routinely: when it reads the identifier string "a" it looks it up in the Scheme symbol table. If it does not yet exist there, a new symbol is created (this is called interning, short for internalizing, I guess). The symbol we refer to as a has associated with it the string "a" which helps us to print it out. It also has associated with it some value, like 8, which helps us in case someone wants to do arithmetic with it, or print out the value. It might also have associated with it other properties, including what bucket it is in in a hash table, etc. Thought Exercise: In Lisp, (setf x 8) creates a new symbol x and stores 8 in a place known as (symbol-value 'x) Write new programs mysetf and mysymbol-value that do not work with the built-in symbol table, but create a new one like the table in AWA. It should work this way: (mysetf "x" 8) (mysetf "y" 4) (mysetf "plus" (symbol-function '+)) ;grab addition function (funcall (mysymbol-value "plus") ;apply + to values of x and y (mysymbol-value "x")(mysymbol-value "y")) gives 12 ;; in scheme, this would work without funcall, but common lisp doesn't like ;; to evaluate the car of a form to a value which ;; happens to be a function. (subtle point) ((mysymbol-value "plus") ;apply + to values of x and y (mysymbol-value "x")(mysymbol-value "y")) ;;;; one solution (setf st (make-hash-table :test #'equal)) (defun mysetf(a b)(setf (gethash a st) b)) (defun mysymbol-value(a) (gethash a st)) ;; note, many symbol tables could be present at the same time. ;; we make just one, global one.