COMP 311 Principles of Programming Languages (Fall 2004)

Section 16: Final Words on Types

Final Words on Types

There are four more short topics that we will consider before concluding our coverage of types.

Mutable Records

We already had mutable records in LC:

M :== (ref M) | (! M) | (M := M) | ...

Adding these to TLC requires typing rules.

First, we extend the type language:

t :== (tref t) | ...

Given the tref type constructor, the rest is straightforward.

      tenv |- M : t
--------------------------
tenv |- (ref M) : (tref t)

tenv |- M : (tref t)
--------------------
 tenv |- (! M) : t

tenv |- M1 : (tref t)    tenv |- M2 : t
---------------------------------------
       tenv |- (M1 := M2) : int

Since we don't care about the result of an assignment statement, we assume it will always return the integer 13.

This gives us mutation in the core of TLC. However, we made three significant extensions to TLC's type structure along the way: datatype, explicit polymorphism and implicit polymorphism. How does mutation behave in the presence of these extensions?

Adding references to datatype does not cause any difficulties. We gain the ability to create records with mutable fields.
References also integrate well with explicit polymorphism. For instance, we could have a type abstraction like
```
(Lambda (alpha)
  (lambda (x (tref alpha))
    ...))
```
and the combined typing rules suffice to type this.
Now assume we have implicit polymorphism in our language. Recall we have type judgements such as
```
 Tenv |- Bexp [ x/Exp ] : t
----------------------------
Tenv |- (let x Exp Bexp) : t
```

Puzzle: What goes wrong if x does not occur free in Bexp?

Now consider the addition of mutable records. ref is a function; how do we type it? It takes one argument, say alpha, and returns an object of type (tref alpha). ! is a projection, while := takes a (tref alpha) and an alpha, and returns an int.

ref : (forall (alpha) (alpha -> (tref alpha)))
!   : (forall (alpha) ((tref alpha) -> alpha))
:=  : (forall (alpha) ((tref alpha) alpha -> int))

Since we are in an implicitly polymorphic language, we can omit type information at bindings, as in

(let (f (ref (lambda (x) x)))
  (begin
    (:= f (lambda (x) 5))
    (if ((! f) true)
      5
      6)))

In Scheme, this program would return 5. In TLC, the copying rule for let "expands" this into

(begin
  (:= (ref (lambda (x) x)) (lambda (x) 5))
  (if ((! (ref (lambda (x) x))) true)
    5
    6))

which passes the type-checker. However, an evaluation in TLC goes wrong. Instead of a boolean value, an integer will appear in the test position of the conditional during execution. If the implementation performs run-time checks, just to be sure, it will raise a "type fault" exception, contradicting the well-typedness theorem. In not, it might mis-interpret the bit string.

The problem was caused by the copying of non-values. ((ref ...) is a computation that has not yet resulted in a value.) As a result, the type-checker checks as if there were several values, which causes it to go awry. One solution to this problem is to require that the expression bound by let be a syntactic value. This way, type-sharing and execution sharing are never conflated. Andrew Wright (of Rice) proved that this simple solution is correct and practical. His solution is described in "Simple Imperative Polymorphism", Lisp and Symbolic Computation, Vol. 8, No. 4, December 1995, pp. 343-356.

Incidentally, we notice this problem immediately in an explicitly polymorphic system, since we would have to write

(let ((f (Lambda alpha ...)))
  ...)

The Lambda-notation immediately clarifies that the right-hand side of the let is a value, yet that it is not the intended reference value. The evaluation would create two reference cells, which avoids the type-fault, but is not what the programmer wanted.

Subtyping

The simple and polymorphic type disciplines for the lambda calculus both force values to belong to a unique ground type (no type quantification or free type variables). But, in practice we often want one type to be a subtype of another. Any type system for an object-oriented language must include subtyping to support inheritance. Subtyping has been explored in the context of the polymorphically typed lambda-calculus, but the results are not very satisfactory. The resulting type systems are both very complex and surprisingly restrictive (many natural recursive constructions are prohibited). In this world, one record (class) type A is a subset of another type B if A includes all of the members of A and the types of these members are subtypes of the corresponding members' types in B. This form of subtyping is called structural subtyping. Fortunately, there is alternative formulation of subtyping, called nominal subtyping, with much nicer properties. In such a type system, a record (class) A is a subset of another type B if A explicitly declares A as supertype (e.g., superclass or superinterface in Java). The advantage of this approach is that all of the natural recursive type definitions that make sense are permissable. The Java type system supports nominal subtyping.

Implications of Types for Execution

In languages like TLC and C, types implicitly specify the number of bits needed for the storage of an object. During the past few lectures, we have made three significant additions to the type system that give it significant abstraction capabilities not found in TLC.

Consider a datatype declaration like
```
(datatype CI
  ((cons1 (int ...))
   (cons2 (char ...))))
```
Say we have a procedure (lambda (x CI) ...). How much space do we need to allocate for x? Since, in general, we cannot tell which variants will be passed in, we will need to allocate space for the largest of them of them. (This is akin to unions in Pascal.) One common technique is to use an indirection so that x occupies the same amount of space no matter which variant is passed in: one size then fits all. This is commonly known as boxing.
Implicit polymorphism similarly forces boxing.
Finally, consider a procedure such as an explicitly polymorphic identity function:
```
(define I
  (Lambda (alpha)
    (lambda (x alpha)
      x)))
```
To use I on 5, we need to instantiate it first, as in ([I int] 5). We can then create a version of I that accepts integer-sized inputs which can be used with integers; and likewise for other types. Thus, explicit polymorphism is one way to potentially reduce the execution overhead of polymorphism tolerable, saving the expense of boxing at the cost of replicating code. C++ templates use this implementation technique: each instantiation of a template generates a new copy of the template that is separately compiled.

The Two Final Pictures

We have come to the end of our exploration of types, so it would be worthwhile to briefly summarize what we have examined. We shall summarize along two lines: the power and the landscape of typed languages.

Power

Is TLC a good programming language? Surely not -- we can't even express recursion in it unless we add least fixed-point operators to the collection of constants. Another way to support recursion is to add reclet. Another potential way to tackle this problem is the enrich the type system to the point where the usual call-by-name and call-by-value Y operators from the untyped lambda calculus can be typed. Unfortunately, neither explicit nor implicit polymorphism has this capability--unless we use circular unification (eliminating the occurs check) instead of conventional unification. In short, the design of a sophisticated typed languages is a very challenging problem, with some apparently innocuous decisions having significant and overarching effects.

Landscape

The universe of typed programming languages is tri-polar. On one hand, we have languages like C, which have simple type systems and are unsafe. On the other hand, languages like SML and Haskell are polymorphic and safe. An extreme strain of this latter class is ML 2000, which is being designed atop an explicitly polymorphic system. (Languages like C++, Modula-3 and Java sit betwixt these poles, varying both in the power of their type systems and in their safety.)

Finally, on the other hand, we have Scheme, which is uni-typed (in the sense of ML), and is also safe. For languages like Scheme, a "type" system like set-based analysis appears to be more appropriate both at supporting the programmer's intuition and for program optimization. The advantage of Scheme's value-set system over ML's rigorous type system is that all values live in the same type space. Hence, it is possible to circumvent the implied type system if a programmer thinks that doing so is correct. In C this requires casting and is avaliable because C is unsafe. In ML, this requires copying and/or injection/projection over datatypes. In sum, Scheme is safe, yet it allows all the tricks C programmers customarily use to evade the straight-jacket of the type system.

In summary, we have the following landscape:

C                                                         Scheme
simply typed, unsafe                          uni/dt-typed, safe
        C++
			                                    Java
                             subtyping with simple nominal types


                                                             SML
                                                  imp poly, safe

						    Generic Java
                         subtyping with parametric nominal types

With this, we conclude our survey of types.

Back to course website