COMP 311 Principles of Programming Languages (Fall 2004)

Lecture 5: A Meta-Interpreter for LC (Meaning)

A Meta-Interpreter for LC (Meaning)

In previous lectures, we talked about defining the semantics "operationally", which means that we defined the semantics by manipulating trees. In this lecture, we talked about defined the semantics by translating LC terms to terms in the meta-language, and we focus on OCaml as a meta-language. Such a semantics is primarily a translation, that is not always easy to implement. But we can still implement an interpreter that looks very much like the translation that we wrote up on the board. Here is the code for one such implementation

type value = 
    Int of int
  | Fun of (value -> value)

let rec ev4 env e =
  match e with
      Cn i -> Int i
    | Vr 0 ->
	(match env with
	     v::vs -> v
	   | _ -> raise (Error "Unbound variable"))
    | Vr n ->
	(match env with
	     v::vs -> ev4 vs (Vr (n-1))
	   | _ -> raise (Error "Unbound variable"))
    | Lm e' -> Fun (fun v -> ev4 (v::env) e')
    | Ap (e1,e2) ->
	(match (ev4 env e1) with
	     (Fun f)  ->
               let v = ev4 env e2 in
		 f v
	   | _ -> raise (Error "Applying non-function"))

In class, we discussed in detail the need for a universal type like the type value used in the code above. We also discussed when it is need in the result of the translation, and when it is needed to express the translation itself.

Now for a more detailed discussion of the same ideas in Scheme. In Lecture 4, we developed a "syntactic" interpreter for LC that relied exclusively on substitution to interpret variables; such interpreters repeatedly rewrite programs in the syntax of the source language until the reduced form is a value. This is a powerful interpretation technique. For instance, even utilities as seemingly far removed from programming languages as the sendmail daemon use it for configuration files.

At the end of Lecture 4, we introduced a different form of interpreter that relies on a table of variable/value pairs, called an environment to interpret variables. Our rationale for introducing this approach was computational efficiency. This approach to interpretation eliminates the high cost of traversing the body of procedure (performing susbstitutions) whenever the procedure is applied.

In this lecture, we will look at a different motivation for environment-based interpreters, which we henceforth call meta-interpreters. A meta-interpreter for LC uses an environment to assign meaning to arbitrary program phrases (expressions) not just closed ones. An environment is a partial function mapping variables names (symbols) to values in the programming language. A meta interpreter takes an environment as an auxiliary argument and relies on this environment to assign meaning to the free variables in the input phrase.

The primary motivation for the term ``meta'' in ``meta-interpreter'' is that a a meta-interpreter assigns meaning to programs by reducing the meaning of program phrases to the meanings of their components. A meta-interpreter assigns meaning to program phrases using the same inductive framework that logicians use to assign meaning to mathematical formulas. This process reduces the meaning of all program phrases to the meaning of a single program written in a metalanguage. If this program happens to be purely functional, then (as we will see later in the course) the meta-interpreter defines the meaning of programs in exactly the same way as logicians assign meaning to formulas in a mathematical theory such as set theory.

In the last lecture, we wrote a meta-interpreter for LC in Scheme that represented environments as lists of variable/value pairs and closures as records containing the procedure text and the closing environment. We also represented LC numbers as Scheme integers. This convention enables us to interpret LC addition as Scheme addition. We can make this interpreter for LC more abstract by representing closures (evaluated lambda-expressions) as Scheme procedures. Then, e can use Scheme application to interpret LC application.

Here is a sketch of MEval, which is essentially our environment-based envEval from last lecture with the representations of environments and closures left unspecified.

(define MEval
  (lambda (M env)
    (cond
      ((var? M) (lookup (var-name M) env))
      ((num? M) (num-num M))
      ((add? M) (+ (MEval (add-left M) env)
		  (MEval (add-right M) env)))
      ((proc? M) (make-closure M env))
      ((app? M) (MApply (MEval (app-rator M) env)
		  (MEval (app-rand M) env))))))

Note: The + operation used above must be chosen with care, since the addition operation in the meta-language won't necessarily be the same as that of the implemented language.

What are the values in LC? There are two: numerals and procedures. Numerals can be represented directly in the meta-language. To avoid a premature choice of representation for closures, we have chosen to use the abstractions make-closure and MApply. Thus, if we ever need to change the interpretation of closures, we can do so without changing the interpreter itself.

Exercise: Which elements of the interpreter would we have to change if we change one of our representation choices?

In the special case when the language we are interpreting is the same as that in which the interpreter is written (for instance, a Scheme interpreter written in Scheme), we call the interpreter meta-circular.

Closure and Environment Representations

The preceding interpreter leaves the representations of closures and environments unspecified.

Closure Representation

In lecture 4, we represented closures as structs (pair) containing an abstract syntax tree M for a lambda expression and an environment assigning a value to each free variable in M.

Let us re-examine the representation of LC closures (function values). The closure type only has two operations:

the "constructor" make-closure: proc Env -> closure that builds the closure for LC proc in environment Env; and
the application operation MApply: closure Val -> Val that applies a closure to a Val (an LC value).

Since Scheme supports procedures as values, we can represent LC procedures using Scheme procedures. Consider the following definition of make-closure which returns a procedure rather than a struct:

; proc Env -> closure
(define make-closure
  (lambda (proc-exp env)
    (lambda (value-for-param)
      (MEval (proc-body proc-exp)
	(extend env (proc-param proc-exp) value-for-param)))))

To use this representation of closures, we must modify the representation of Mapply:

; closure Val -> Val
(define MApply
  (lambda (clos val)
    (clos val)))

Note that the Scheme closure returned by (make-closure proc-exp env) closes over env; env appears free in the body of make-closure.

Abstractly, we can characterize MApply and MEval as follows:

  (MApply (make-closure (make-proc var body) env) val)
= (MEval body (extend env var val))

A representation of closures as structures (code-env pairs) was presented in the preceding lecture. Exactly the same representation works here.

Exercise: Given the declaration
(define-structure (closure P E))
how do we write MApply?

Environment Representation

One part of the interpreter remains unspecified: the representation of environments. Before considering the available alternatives, it is worth-while to consider environments abstractly too. There are three things we need to understand with respect to environments:

the lookup: Var Env -> Val method
the extend: Env Var Val -> Env method for extending an environment with a new binding; and
emptyEnv: Env, the empty environment.

The latter two create new environments, while the first of these extracts information from information. Here are the equations that relate the constructors and the selector:

(lookup var (empty-env)) = (error 'lookup "variable ~a not found" var) 
(lookup var (extend env varN val)) =
  (if (eq? var varN) ;; are var and varN the same symbol
      val
      (lookup var env))

What is a good representation choice for environments? Note that there is only a fixed number of free variables in a given program, and that we can ascertain how many there are before we begin evaluating the program. On the other hand, we can be lax and assume that there can be arbitrarily many free variables. A good representation in the former case is the vector; in the latter case, we might wish to use lists. However, there is at least one more representation.

Representing Environments as Procedures

Since environments are simply extendable functions mapping symbols to values, we can represent them as Scheme procedures (or Java commands).

Consider the following implementations:

(define lookup
  (lambda (var env)
    (env var)))

(define empty-env
  (lambda ()
    (lambda (var)
      (error 'lookup "variable ~a not found" var))))

We can then prove that this implementation satisfies one of the equations that characterize environments:

  (lookup var (empty-env))
= (lookup var ((lambda () 
      (lambda (varN) (error 'lookup "variable ~a not found" varN)))))
= (lookup var (lambda (varN) (error 'lookup "variable ~a not found" varN)))
= ((lambda (varN) (error 'lookup "variable ~a not found" varN)) var)
= (error 'lookup "variable ~a not found" var)

as desired. We can similarly define extend:

(define extend
  (lambda (env varN val)
    (lambda (name)
      (if (eq? name varN)
	val
	(env name)))))

Exercise: Verify that extend and lookup satisfy the above equation.

Binding Constructs

Now suppose we added some new binding constructs to LC. For instance, suppose we added the struct seq-let to our abstract syntax, and defined its behavior as follows:

  (MEval (make-seq-let var rhs body) env)
=
  (MEval body (extend env var
	        (MEval rhs env)))

However, now say we add recursive lexical bindings, represented by the abstract syntax struct rec-let. Then we want

  (MEval (make-rec-let var rhs body)" env)
=
  (MEval body (extend env var
	        (MEval rhs ...)))

where the ... represents the (extend env var ...) term. How can we implement such a construct? We clearly need a way to create an environment that refers to itself. If we represent environments as procedures, we can use recursive procedures to implement this kind of extension.

Exercise: Can we use lists or other representations to accomplish this goal?

Hint: What did we do in Comp 210 to create data structures that refer to themselves?

Back to course website