[PLT logo] Lecture 3, Comp 311

Free and Bound, Scope and Static Distance:
or, How to Process Abstract Syntax Trees

Almost every programming language has constructs for introducing a new variable and for describing which occurrences of the variable in the rest of the program refer to this particular variable. Thus if a variable is used several times, a programmer can resolve which variable is really meant and can determine the current value of the variable.

For example, in C++ a loop header may introduce a new loop variable and its scope:

for(int i = 0; i < NumOfElements; i++) ... ;

The scope of the variable i is the program text following the = all the way to the end of the loop body. If the loop contains a break construct and if the loop is followed by some statement like

printf("the index where we stopped searching is %d\n", i);
then we know from the rules of scoping that the program probably contains an error. Since the i in the call to printf does not refer to the i that was introduced as the loop variable but to some i that was declared elsewhere, it is unlikely that the value of i in the call to printf is that of the loop index when the loop was terminated.

Note: We reasoned about the program with a vague understanding of its exact execution but with a precise understanding of the scoping rules. These rules are crucial for building a good mental execution model.

Other examples of scoping constructs in C++ include


Exercise: Determine in a C++ book what scope these constructs establish.

Free and Bound, Scope: A Simple Example

Using LC, our little toy language from the previous lecture, we can illustrate how to specify the notion of free and bound occurrences of program variables rigorously. This specification will serve two purposes:

Recall the surface syntax of LC:

Exp = Var | Num | (lambda Var Exp) | (Exp Exp)
Var = Sym \ {'lambda}

If we interpret LC as a sub-language of Scheme, it contains only one binding construct: lambda-expressions. In

(lambda a-variable an-expression)

a-variable is introduced as a new, unique variable whose scope is roughly the entire lambda-expression, which effectively is an-expression.

One way to determine the scope of variable binding constructs is to define when some variable occurs free in some expression. By specifying where a variable occurs free and how this relation is affected by the various constructs, we also define the scope of variable definitions. Here is the definition for LC.

Definition (Free occurrence of a variable in LC):

Let a-var, another-var range over the elements of Var.
Let a-num range over the elements of Num.
Let an-exp, another-exp range over the elements of Exp.

Then a-var occurs free in:

  • another-var if a-var = another-var;
  • (lambda another-var an-exp) if a-var = another-var and a-var occurs free in an-exp.
  • (an-exp another-exp) if it occurs free either in an-exp or in another-exp. The relation a-var occurs free in an-exp is the least relation satisfying the preceding constraints.

    If a-var occurs free in an-exp, then some specific occurrence of a-var in an-exp can be identified as a "free occurrence". In fact, we could adapt the preceding definition to define the related notion "occurrence k of a-var is free in an-exp", but the details are tedious because we must label each occurrence of a-var with a unique index k, or alternatively, we must uniquely identify each occurrence by a path from the root in the abstract syntax tree for an-exp. An occurrence of variable a-var is bound in an-exp iff it is not free in an-exp. The most blatant form of bound occurrence is the second component of a lambda-expression. This particular occurrence of a variable a-var is called a binding occurrence. Each logically distinct variable obviously has one binding occurrence; if a-var occurs twice as the second component of a lambda-expression, it is the name of two distinct variables.

    Note that the definition of the "occurs free" and "free occurence" relations are inductive over expressions precisely because the definition of Exp is itself inductive.


    Exercise: Formulate the notion of bound occurrence explicitly as an inductively defined relation.

    The definitions of free and bound implicitly determine the notion of a scope in LC expressions. Clearly, a lambda expression opens a new scope; or, to put it differently, the scope of the binding occurrence a-var in

    (lambda a-var an-Exp)
    
    is that textual region of an-Exp where a-var might occur free.


    Exercise: Find an expression in which x occurs both free and bound.

    Static Distance Representation

    Given a variable occurrence it is natural for a programmer to ask where the binding occurrence is. This may tell him whether or not some nearby variable occurrence is related, or it may help him to determine something about the value that it stands for. Consider the expression

    (lambda z (lambda x ((lambda x (z (z (z x)))) x)))
    

    The two occurrences of x in the fragment

                                        ... x)))) x) 
    

    are unrelated, but only the binding occurrences for the two can tell them apart.

    For a human being, a representation that includes arrows from bound occurrences to binding occurrences is clearly preferable. We can approximate such graphical representations of programs by replacing variable occurrences with numbers that indicate how far away in the surrounding context the binding construct is. The above expression would translate into

    (lambda z (lambda x ((lambda x (3 (3 (3 1)))) 1)))
    

    Indeed, since the parameters in lambda's are now superfluous, we can omit them completely:

    (lambda (lambda ((lambda (3 (3 (3 1)))) 1)))
    

    This representation is often called either the static distance representation of the term or the "deBruijn (de BROIN) Inotation" for the term. deBruijn is a Dutch mathematician who recognized the theoretical advantages of the notation in the context of idealized programming language called the lambda-calculus, which was invented by Alonzo Church in the 1930's. LC is a slight extension of the lambda-calculus. Although the static distance representation is not particularly helpful for people, it is valuable for compilers and interpreters.

    We could specify the process that replaces variable occurrences with static distances in English along the lines of the above definitions. Instead, we write a program that performs the translation.

    First, recall the abstract representation of the set of LC expressions is given by the equation:

    AE = (make-var Var)
       | (make-const Num)
       | (make-proc Var AR) 
       | (make-app AR AR)
    
    based on the data definitions
    (define-struct var (symbol))
    (define-struct const (number))
    (define-struct proc (var body))
    (define-struct app (rator rand))
    

    Second, the program template for abstract representations is clearly

    (define fAR
      (lambda (an-ar)
        (cond
          ((var? an-ar) ...)
          ((const? an-ar) ...)
          ((proc? an-ar) ... (fAR ... (proc-body an-ar))
                          ... (proc-var an-ar) ...)
          ((app? an-ar) ... (fAR (app-rator an-ar)) ... 
                         ... (fAR (app-rand an-ar)) ...))))
    

    Since the replacement process substitutes a variable by a number that depends on the context of the variable, that is, the syntactic constructions surrounding the variable occurrence, we also need an accumulator. In our case, it suffices to accumulate the variables in binding constructs as we traverse the expression. This means the template can be refined to:

    (define fAR
      (lambda (an-ar binding-vars)
        (cond
          ((var? an-ar) ...)
          ((const? an-ar) ...)
          ((proc? an-ar) 
           ... (fAR (proc-body an-ar)
                     (cons (proc-var an-ar) binding-vars)) ...)
          ((app? an-ar) ... (fAR (app-rator an-ar) binding-vars) ... 
                         ... (fAR (app-rand an-ar) binding-vars) ...))))
    

    Finally, to distinguish the initial abstract representation of "deBruijn" programs from the new one, we introducce two new data definitions:

    (define-struct sdvar (sdc))
    (define-struct sdproc (body))
    

    and define the set of SD programs by the equation:

    SD = (make-sdvar sdc)
       | (make-num Num)
       | (make-sdproc SD) 
       | (make-app AR AR)
    
    We do not introduce new records for constants or applications, because their structure is unchanged.

    We can now complete the translation:

    (define sd
      (lambda (an-ar binding-vars)
        (cond
          ((var? an-ar) (make-sdvar (sdlookup (var-name an-ar) binding-vars)))
          ((const? an-ar) an-ar)
          ((proc? an-ar) 
           (make-sdproc 
    	(sd (proc-body an-ar) (cons (proc-var an-ar) binding-vars))))
          (else
           (make-app (sd (app-rator an-ar) binding-vars) 
    		 (sd (app-rand an-ar) binding-vars))))))
    
    where
    (define sdlookup
      (lambda (a-var vars)
        (cond
          ((null? vars) (error 'sdlookup "free occurrence of ~s" a-var))
          (else (if (eq? (car vars) a-var)
    		1
    		(add1 (sdlookup a-var (cdr vars))))))))
    

    Variables are replaced by their static distance coordinate, which is determined by looking up how deep in the list of binding vars it occurs. If the list does not contain the variable, we signal an error. Constants do not need to be translated. Applications are traversed and re-constructed.

    We can test sd by applying it to the result of parsing some surface syntax and the empty list


    Exercise: What should the output of this be?
    (sd (parse '(lambda z (lambda x ((lambda x (z (z (z x)))) x)))) null)
    

    (The empty list of variables indicates that we consider this to be the complete program with no further context.)

    cork@cs.rice.edu/ (adapted from shriram@cs.rice.edu)