COMP 311 Principles of Programming Languages (Fall 2009)

Supplement to Lecture 5: Eliminating Meta Errors

Eliminating Meta-Errors

Understand thyself.
--Aristotle

Starting with this lecture, we will attempt to eliminate meta-level interpretations of concepts that do not have a widely-understood meaning in mathematics (e.g. integers, function application). The behavior that an interpreter specifies should be independent of the peculiar semantics of the metalanguage. Independence guarantees that the definition exposes all of semantic subtleties of the language (instead of inheriting the subtleties of the metalanguage), making it far more useful as a defining standard.

Errors

Consider what our interpreter for LC might do when we feed it an erroneous input such as

(MEval '(5 6))

Depending on which language we use to implement the interpreter, we will get different results. For instance, a Scheme implementation might report one of the following:

Error: attempt to apply non-procedure 5
apply: not a procedure (type was <fixnum>)

This is an example of meta-level inheritance, wherein the implemented language has inherited a behavior (in this case, for errors) from the implementing language. Another example of this form of inheritance is the treatment of errors in C programs. Depending on the platform (operating system and hardware) that they run on, C programs behave differently when errors arise. A language specification should define when an error arises and, for clarity, what happens when an error occurs.

Consider the following fragment of the interpreter:

((numeral? M) (numeral-n M))
((add? M) (+ (MEval (add-lhs M) env)
	    (MEval (add-rhs M) env)))
((lam? M) (make-closure M env))
((app? M) (Apply (MEval (app-fun M) env) (MEval (app-arg M) env)))

Where do we inherit error behavior from Scheme?

Use of the generic + operation from Scheme. Addition is a widely understood concept from mathematics, but its extension to other forms of data is not. Our interpreter relies on the error checking done by Scheme. Instead of using the Scheme + operation directly, we can instead confirm that both arguments are numbers (using number?) and invoke it only if the tests succeed, signaling an error otherwise.
```
    (define LC-+
      (lambda (m n)
    (if (and (number? m) (number? n))
      (+ m n)
      (LC-error ...))))
    
```

Unguarded use of Scheme's application mechanism: we should instead check that it is a closure that is being applied.

    (define Apply
      (lambda (f a)
    (if (closure? f)
      (let ((body (closure-body f)))
        (MEval (lam-body body)
        (extend (closure-env f) (lam-arg body) a)))
      (LC-error ...))))

Suppose we were to add a conditional statement, if, to our language that interprets 0 as "false" and non-zero integers as "true". Then we might have the following implementation:

((If? M)
 (if (MEval (If-test M) env)
     (MEval (If-then M) env)
     (MEval (If-else M) env)))

Unfortunately, this definition does not work: all Scheme values returned by MEval are acceptable in the test position of an if expression. Moreover, every integer, procedure, and list, is interpreted as "true", so the failing branch will never be taken. We could rewrite the code implementing if as:

(if (=/= (MEval (If-test M) env) 0)
  ...
  ...)

but in the process, we inherit anoteher aspect of Scheme's error-handling, since we cannot be sure the value returned by MEval will always be a number. We can eliminate this dependence in exactly the same way that we did for +.

Question: Why do some languages like Scheme and C pick one value to represent falsehood, and allow all others to represent truth?

Question: By explicitly checking for errors, we have introduced a more subtle dependence on the semantics of Scheme. What is it?

Answer: We have asked questions such as number?, which requires that the data of the metalanguage be "self-identifying".

Self-Identifying Data

Most programming languages do not support operations like number? because data representations do not carry enough information to determine what kind of data is represented! (How can you tell a pointer value from an integer value in C or C++?) Just as we introduced abstract syntax to sever our dependence on the meta-language to represent programs, we can introduce meta-values so that we don't rely entirely on the meta-language to represent values. We can do this with declarations such as

(define-structure (Num n))
(define-structure (Clos param body env))
(define-structure (Bool b))

so now we can now write Apply with Clos?

(define Apply
  (lambda (f a)
    (if (Clos? f)
        ...)))

rather than with procedure?.

Some semanticists argue that this is not an entirely satisfactory solution: since define-structure is a complex construct specific to Scheme, we haven't really explained much by replacing, say, number? with Num?. I disagree because a collection of define-struct declarations simply defines a free term algebra, which is a widely understood mathematical concept. Nevertheless, according to the critics of define-struct, we could instead adopt the convention that every data object is represented by a two-element list consisting of a tag and an value. In this case, we would define the LC number constructor and recognizer as follows:

(define make-Num
  (lambda (n)
    (list 'Num n)))

(define Num?
  (lambda (LC-val)
    (eq? (car LC-val 'Num))))

Question: What other changes do we need to make, eg, in LC-+?

Let us now consider our definition of Num?. The symbol 'Num is easily represented as a number and eq? is a simple hardware instruction. We still need to explain car.

What we just discovered is that self-identifying data rely on the meta-language's ability to allocate data dynamically. This is also true of closures and arrays. Given the importance of this concept, we will next study an explanation of dynamic allocation.

Back to course website