[PLT logo] Lecture 4, Comp 311

A Syntactic Interpreter for LC (Behavior)

Recall the syntax of LC. We will add a binary operator, +, to the language, giving the following syntax for terms in the language:

M :== x | n | (lambda (x) M) | (M M) | (+ M M)
A proper LC program is an LC expression M that is closed, i,e., contains no free variables. An LC program is any LC expression. Think of LC as being a sub-language of Scheme. For the moment, we will confine our attention to proper programs.

We similarly extend our abstract syntax for LC to include addition as a primitive application. The set AE of abstract representations is defined bv the equation:

AE = (make-var Var)
   | (make-const Num)
   | (make-proc Var AR) 
   | (make-app AR AR)
   | (make-add AR AR)

given the additional data definition:

(define-struct var (name))
(define-struct const (number))
(define-struct proc (param body))
(define-struct app (rator rand))
(define-struct add (left right))

Recall that in Comp 210, we formulated the semantics of Scheme as an extension to the familiar algebraic calculation process. Consider (+ (+ 3 4) (+ 7 5)). To evaluate this, we first determine the value of (+ 3 4) and of (+ 7 5). Why? Because these are not values and + can only add two values (numbers).

What is a value? Intuitively, it is any expression that can be returned as the "answer" produced by a computation. We must understand this concept in detail before we can explain the process of evaluation. Let's consider the various possible forms for expression and determine which are values.

First, lambda terms are values. Are numbers values? Yes, they are. Are identifiers values? No, they stand for values. Are applications values? No, they are not; they trigger an evaluation. Are primitive applications (+ M M) values? No, they also trigger an evaluation. In summary, (M M) and (+ M M) are computations, (lambda ...) and numbers are values, and identifiers are neither (they are not computations because they do not trigger evaluation in our interpreter for LC and they are not values because they are not legitimate "answers" for computations). Anyway, by following the regular rules of evaluation, we determine that the expression above yields the value 19. Let us consider each class of terms individually, but let us ignore variables for the moment.


Rules of Evaluation

Rule 1: For +-expressions, evaluate the operands, then add them.

Rule 2: For applications, substitute the argument for the parameter in the body. Does this argument have to be a value? We shall require that it must.


What happens if we don't require the argument to be a value? Certain computations which we might expect to not terminate do, in fact, yield a value. Consider
((lambda (y) 5) ((lambda (x) (x x)) (lambda (x) (x x))))
The argument given to the procedure does not reduce to a value. However, since the argument need not be evaluated before being substituted for y, the application can be reduced, evaluating to the answer 5.

Hence, we ``patch'' our application rule: For applications, substitute the value of the argument for the parameter in the body. However, the rule still isn't quite satisfactory. Consider the procedure (lambda (x) (lambda (x) x)). When this procedure is applied, we do not want the inner x to be substituted; that should happen only when the inner procedure is applied. Hence, we amend our rule to read,

For applications, substitute the value of the argument for all free instances of the parameter in the body.

We are still not done. Consider the computation

((lambda (x) (+ x x)) 5)
The purpose of an evaluation rule is to yield a value in the end (if one exists); however, as per our current application rule, the result of reducing the computation above is (+ 5 5), which is not a value. Hence, we make the Third Amendment:

For applications, substitute the value of the argument for all free instances of the parameter in the body, and evaluate the resulting expression.

As long as we only evaluate proper programs (closed expressions), the preceding evaluation rules are satisfactory. But they will fail if we try to evaluate arbitrary programs (expressions that may contain free variables) or if we try to use the evaluation rule for applications as a transformation rule to simplify programs. Consider the following reduction of an expression with the free variable x:

  ((lambda (y) (lambda (x) y)) (lambda (z) x))
= (lambda (x) (lambda (z) x))
What happened? The free occurrence of x in the application bound by the inner (lambda (x) ...). This anomaly called "capture of a free variable" is clearly not what we intended. Our evaluation rules will produce legitimate answers (integer values) for some improper programs that should produce run-time errors. Worse, if we try to use the reduction of applications as a transformation rule to simplify programs, we can change the meaning of programs.


Puzzle: Devise a closed LC program P where reducing an application (using our simple substitution rule) inside a lambda produces a program P' that is not semantically equivalent to P (P' produces different answers for some inputs than P does).

There are two ways out of this conundrum:

If we wanted our language LC to resemble Scheme, we would have to use the second alternative above. At this point we deviate from Scheme and assume that a program does not contain any free variables.


Puzzle: What are the equivalent evaluation rules for LC programs that have been converted to static distance (deBruijn) notation? What is the analog of capture when the application rule is used to simplify programs?

An Evaluator

Now we can translate our rules into a program. Here is a sketch of the evaluator:

(define Eval
  (lambda (M)
    (cond
      ((var? M) (impossible ...))
      ((const? M) M)
      ((proc? M) M)
      ((add? M) (add-num
		  (Eval (add-left M))
		  (Eval (add-right M))))
      (else ; (app? M) ==> #t
	(Apply
	  (Eval (app-rator M))
	  (Eval (app-rand M)))))))

(define Apply
  (lambda (a-proc a-value)
    (Eval (substitute a-val ; for
	    (proc-param a-proc) ; in
	    (proc-body a-proc)))))

(define substitute
  (lambda (v x M)
    (cond ; M
      ... cases go here ...
      )))

The key property of this evaluator is that it only manipulates (abstract) syntax. It specifies the meaning of LC by describing how phrases in LC relate to each other. Put differently, the evaluator only specifies the behavior of LC phrases, roughly along the lines of a machine, though the individual machine states are programs. Hence, this evaluator is a syntactic method for describing meaning.


Toward Assigning Meaning to Phrases

From the perspective of implementation efficiency, our evaluator (Eval) has some problems. Consider its behavior on an input like

((lambda (x) <big-proc>) 5)
Assume that <big-proc> consists of many uses of x. The evaluator must step through this entire procedure body, replacing every occurrence of x by the value 5. The new tree produced by this process has the same size as the original tree (since we traversed the entire tree, and replaced an identifier with a value). What does eval do next? It walks once again over the entirety of the new tree that it just produced after substitution, for the purpose of evaluating it. This two phase process is clearly very wasteful, particularly since it involves copying the original abstract syntax tree for <big-proc>.

To be more frugal, eval could instead do the following: it could merge the two traversals into one, which is carried out when evaluating the (as yet unsubstituted) tree. During this traversal, it could carry along a table of the necessary substitutions, which it could then perform just before evaluating an identifier. This table of delayed substitutions is called an environment.

Hence, our evaluator now takes two arguments: the expression and an environment. The environment env contains a list of the identifiers that are free in the body, and the values associated with them.

(define Eval
  (lambda (M env)
    (cond
      ((var? M) lookup M's name in env)
      ((const? M) ...)
      ((proc? M) ... what do we do here? ...)
      ((add? M) ...)
      (else ... ))))

The interesting clause in this definition is the code for evaluating procedures. To preserve the semantics of our evaluation rules, we must keep track of the environment active at the time the procedure was evaluated. Why? Because this environment contains the correct substitutions. When the procedure is finally applied (possibly more than once), the current environment will generally contain the WRONG substitutions.


Exercise: Devise a simple example, where the environment at the point of application contains a WRONG substutition for the body of the procedure being applied. Hint: use the same formal parameter name in a function being passed as an argument and in the function it is being passsed to.
The process of building a procedure representation that includes the correct environment is called "building a closure" or "closing over the environment of the procedure". Unfortunately, many programming languages with procedures as arguments have historically failed to recognize the need to construct closures. Keeping track of the environment for a procedure requires a lot of extra machinery if the interpreter is written in low level language like C. In addition, many language implementors unfamiliar with functions as arguments have failed to realize that failing to keep track of the procedures "lexical" environment is inconsistent with the standard substitution based evaluation rules. If procedures are not closed, the interpreter will implement an ugly semantic model called "dynamic scoping". The history of programming languages is replete with languages that fail to construct closures and hence implement dynamic scoping. They include SNOBOL, APL, some early dialects of LISP, and tcl. To finish our environment-based interpreter we need to choose representations for environments and closures and then fill in the gaps in the template above. For the moment, we will make the simplest, most pedestrian representational choices. In the next lecture, we will consider alternatives. An environment is a table mapping variables to substitutions. A simple choice for a representation is a list of pairs. Similarly a simple representation for a closure is a structure containing the procedure text and the environment. Given these choices our interpreter becomes:
(define-struct pair (var val))
(define-struct closure (body env))


(define Eval
  (lambda (M env)
    (cond
      ((var? M) (lookup M env))
      ((const? M) M)
      ((proc? M) (make-closure M env))
      ((add? M) (add (Eval (add-left M) env) (Eval (add-right M) env)))
      (else (Apply (Eval (app-rator M) env) (Eval (app-rand M) env))))))
         
(define Apply
  (lambda (fn arg) ; fn must be a closure
    (Eval (proc-body (closure-body fn)) 
          (extend (closure-env fn) (proc-param (closure-body)) arg))))

(define lookup
  (lambda (var env)
    (if (null? env) 
        (error ...)
        (if (eq? var (pair-var (car env)))
            val
            (lookup var (cdr env)))))) 

Summary

We have


Solution to Earlier Exercise

Let the form

(let (v M) N)}
abbreviate the LC expression
((lambda (v) N) M)
This abbreviation makes LC code much easier to read.

If an LC evaluator does not build closures to represent procedure values, it will compute the wrong answer for the following program:

(let (y 17)
     (let (f (lambda (x) (+ y y)))
            (let (y 2)
	         (f 0))))
This program abbreviates the LC program
((lambda (y)
  ((lambda (f)
    ((lambda (y)
      (f 0))
     2))
   (lambda (x) (+ y y)))
 17))
Given this program, a closure-based LC interpreter will
  1. bind y to 17 and evaluate
    (let f ...)
    in the resulting environment
    (y = 17)
    
  2. bind f to the closure [(lambda (x) (+ y y)), (y = 17)] and evaluate
    (let (y 2) ...)
    
    in the extended environment
    (f = [(lambda (x) (+ y y)), (y = 17)]; y = 17)
    
  3. bind y to 2 and evaluate
    (f 0)
    
    in the extended environment
    ((y = 2; f = [(lambda (x) (+ y y)), (y = 17)]; y = 17)
    
  4. lookup f to determine that f is the closure
    [(lambda (x) (+ y y)), (y = 17)]
    
    and evaluate this closure applied to the value 0

  5. evaluate the closure's procedure body (+ y y) in the extended closure environment
    (x = 0; y = 17)
  6. return the value 34.

An incorrect interpreter that fails to build closures will perform the following computation for the same program:

  1. bind y to 17 and evaluate
    (let f ...)
    
    in the resulting environment
    (y = 17)
    
  2. bind f to the proc syntax (lambda (x) (+ y y)) and evaluate
    (let (y 2) ...)
    
    in the extended environment
    (f = (lambda (x) (+ y y); y = 17)
    
  3. bind y to 2 and evaluate
    (f 0)
    
    in the extended environment
    (y = 2; f = (lambda (x) (+ y y)); y = 17)
    
  4. lookup f to determine that f is the expression text
    (lambda (x) (+ y y))
    
    and evaluate this expression applied to the value 0 in the environment
    (y = 2; f = (lambda (x) (+ y y)); y = 17)
    
  5. evaluate the body (+ y y) in the extended environment
    (x = 0; y = 2; f = (lambda (x) (+ y y)); y = 17)
    
  6. return the value 4.
cork@cs.rice.edu/ (adapted from shriram@cs.rice.edu)