[PLT logo] Types I, Comp 312

What is a Type?

One of the central tenets of software engineering is that of early error discovery. Many programming languages support this idea with a type system. Roughly speaking, a type is a name for a collection of values and a type declaration is simply an invariant assertion asserting that a program expresssion (variable, function application, etc.) is constrained to return a value of the specified type. A type system encourages the programmer to think about the type of the value that each expression in the program produces.

Suppose the expression

(if <big, ugly expression>
  (5 6)
  a-nice-value)
is embedded deep in a program. If program tests never force the evaluation of <big, ugly expression> to return true, the error in the then branch is never discovered. However, it may still be the case that some input will eventually force the evaluation of (5 6), which will then generate a run-time error. This error could clearly have been avoided if the programmer and/or the language implementation had flagged the value 5 as something that is inappropriate for the function position of an application.

Idea 1: Types are names for sets of data values.

Idea 2: The valid sets of ``input values'' for each program operation can be described in terms of types.

A functional programming language based on the functional core of Scheme is a good pedagogic language for studying type systems. We define the syntax of the language LC by the production rule

M ::= x | c | (M M) | (lambda x M)


where x is any variable name in a given set of
identifiers V and c is any constant is a give set of
constants C.  We will assume that C includes
all integer constants, the if-then-else function (appropriately
curried), all boolean constants, basic integer and boolean functions
like +, <, and and.

We can assign all of the constants in C types in
the set generated by the following inductive definition:

Type = int | bool | Type -> Type
The infix operator -> is called a type constructor, since it builds a more complex type out of simpler ones. The names int and bool respectively denote the set of integer values and the set of boolean values. The constructed type A -> B denotes the set of functions from set A into set B. In the set of constants C, all integer constants have type int, all boolean constants have the type bool, and the basic functions on integer and booleans have their obvious curried fucnction (arrow) types. For exmaple, + has type
int -> (int -> int)
In contrast, the if-then-else operation has many different types since the consequent and alternative expressions can be take from any type. For the sake of clarity and simplicity, we will restrict the usage of if-then-else to the type
bool -> (int -> (int -> int))
Hence every constant in LC has a unique type. Exercise: What is C's syntax for types? In particular, how does it represent function types?

Once we have names for sets of values, we can try to determine to which set the result of an expression belongs and see whether this result makes sense in the given context.

Idea 3: Tell me what the types of the variables in an expression are and I'll tell you what the type of the expression is.

In LC, as in every other language, we have free and bound variable occurrences. If we attach types to every binding occurrence of a variable, we have clearly covered all bound occurrences. This idea suggests a small modification to LC's syntax:

M :== var | (lambda var type M) | (M M) | n | (+ M M)

But what do we do about expressions with free variables? For those we must assume that we are given their types. The association of a free variable with a type is sometimes called a type context or a type environment. Given any expression and a type environment that covers all of its free variables, we can determine the type of the expression and can thus predict what kind of result it will produce.

Here is a type checker for LC, using a straightforward natural recursion formulation.

;; Type check a closed abstract representation of an LC expression
(define TypeCheck
  (lambda (are)
    (TC are (mt))))

;; Type check an open abstract representation of an LC expression
;; are: an abstract LC expression
;; tenv: an environment that associates variables with types
;; result: the type of are in tenv
;; effect: error if the types don't work out
(define TC
  (lambda (are tenv)
    (cond
      ((var? are) (lookup (var-name are)
		    tenv
		    (lambda () (error 'TC "free var: ~s" are))))
      ((const? are) 'int)
      ((add? are) 
	(Type= (TC (add-left are) tenv) 'int (add-left are))
	(Type= (TC (add-right are) tenv) 'int (add-right are)))
      ((if? are)
	(Type= (TC (if-tst are) tenv) 'int (if-tst are))
	(Type= (TC (if-thn are) tenv) (TC (if-els are) tenv) are))
      ((app? are)
	(let ((funT (TC (app-rator are) tenv)))
	  (unless (->? funT)
	    (error 'TC "function type expected for ~s~n  ~s inferred~n"
	      (app-rator are) funT))
	  (Type= (->-domain funT)
	    (argT (TC (app-rand are) tenv))
	    (app-rand are))
	  (->-range funT)))
      ((proc? are)
	(let ((ptype (proc-type are)))
	  (make--> ptype
	    (TC (proc-body are)
	      (extend tenv (proc-param are) ptype)))))
      ((rec? are)
	(let* ((var-type (rec-type are))
		(rtenv (extend tenv (rec-var are) var-type)))
	  (Type= (TC (rec-rhs are) rtenv) var-type (rec-rhs are))
	  (TC (rec-body are) rtenv)))
      (else (error 'TC "impossible: ~s" are)))))

;; Compare two types, raise error if mismatched
;; recd: the type that was inferred for b
;; expected: the expected type (according to context)
;; are: an expression 
;; result: recd if recd and expected are structurally identical types
;; effect: cal to error with are if the types don't match
(define Type=
  (lambda (recd expected b)
    (if (equal? recd expected) recd
	(error 'Type= "expected: ~s; constructed: ~s~n   for ~s"
	  expected recd b))))
Using the type checker we can formulate the following more concrete claim about LC (which is basically a formal version of idea 3):

If (TC M mt-env) = t and if (Eval M mt-env) = V, then (TC V mt-env) = t.

Note: LC satisfies this but most so-called ``strongly typed'' languages don't. In C programs, for example, even a variable declared of some type isn't guaranted to contain a C value of that kind. This property is only true if we accept that C values are bit strings. Pascal and Ada have similar problems.


Typing Rules

In the previous section, we have formulated a type-checker as a program. It is possible to describe a type theory in a more language-independent fashion, and indeed, most modern presentations of type theories use this style. The basic idea of this form of presentation is to establish type judgements of the form

tenv |- M : type
with a formal proof system. In this judgement, tenv is a type environment; the `turnstile' (|-) is read ``proves''; M is a term; the colon (:) reads as ``has the type''; and the emphasized text is the derived type. (We will use emphasis to distinguish information generated, rather than specified.) We then have the following rules:

  • We begin with an axiom that tells us we can extract information reposited in the environment:
        tenv |- x : t    if tenv(x) = t
        

  • Next, we have an inference rule that shows us how we construct function types; this is where type assertions are introduced into the type environment:
              tenv + [x : t] |- M : s
        -----------------------------------
        tenv |- (lambda (x : t) M) : t -> s
        

  • Finally, we show the typing of applications:
        tenv |- F : t -> s    tenv |- A : t
        -----------------------------------
                 tenv |- (F A) : s
        

Hence, inference rules produce proof trees. The text above the lines (called the antecedents) are judgements that need to be established using inference rules and axioms. When we use only axioms, we have finished our job. Notice that the rule for application requires two proofs to establish the antecedent, while the one for procedures requires only one.

cork@cs.rice.edu (adapted from shriram@cs.rice.edu)