One of the central tenets of software engineering is that of early error discovery. Many programming languages support this idea with a type system. Roughly speaking, a type is a name for a collection of values and a type declaration is simply an invariant assertion asserting that a program expresssion (variable, function application, etc.) is constrained to return a value of the specified type. A type system encourages the programmer to think about the type of the value that each expression in the program produces.
Suppose the expression
(if <big, ugly expression> (5 6) a-nice-value)is embedded deep in a program. If program tests never force the evaluation of
<big, ugly expression>
to return
true, the error in the then branch is never discovered. However, it
may still be the case that some input will eventually force the
evaluation of (5 6)
, which will then generate a run-time
error. This error could clearly have been avoided if the programmer
and/or the language implementation had flagged the value
5
as something that is inappropriate for the function
position of an application.
Idea 1: Types are names for sets of data values.
Idea 2: The valid sets of ``input values'' for each program operation can be described in terms of types.
A functional programming language based on the functional core of Scheme is a good pedagogic language for studying type systems. We define the syntax of the language LC by the production rule
M ::= x | c | (M M) | (lambda x M)where x
is any variable name in a given set of identifiersV
and c is any constant is a give set of constantsC
. We will assume thatC
includes all integer constants, the if-then-else function (appropriately curried), all boolean constants, basic integer and boolean functions like+
,<
, andand
. We can assign all of the constants inC
types in the set generated by the following inductive definition:Type = int | bool | Type -> TypeThe infix operator->
is called a type constructor, since it builds a more complex type out of simpler ones. The namesint
andbool respectively denote the set of integer values and the set of boolean values. The constructed type
A -> B
denotes the set of functions from set A into set B. In the set of constantsC
, all integer constants have typeint
, all boolean constants have the typebool
, and the basic functions on integer and booleans have their obvious curried fucnction (arrow) types. For exmaple,+
has typeint -> (int -> int)In contrast, theif-then-else
operation has many different types since the consequent and alternative expressions can be take from any type. For the sake of clarity and simplicity, we will restrict the usage ofif-then-else
to the typebool -> (int -> (int -> int))Hence every constant in LC has a unique type. Exercise: What is C's syntax for types? In particular, how does it represent function types?
Once we have names for sets of values, we can try to determine to which set the result of an expression belongs and see whether this result makes sense in the given context.
Idea 3: Tell me what the types of the variables in an expression are and I'll tell you what the type of the expression is.
In LC, as in every other language, we have free and bound variable occurrences. If we attach types to every binding occurrence of a variable, we have clearly covered all bound occurrences. This idea suggests a small modification to LC's syntax:
M :== var | (lambda var type M) | (M M) | n | (+ M M)But what do we do about expressions with free variables? For those we must assume that we are given their types. The association of a free variable with a type is sometimes called a type context or a type environment. Given any expression and a type environment that covers all of its free variables, we can determine the type of the expression and can thus predict what kind of result it will produce.
Here is a type checker for LC, using a straightforward natural recursion formulation.
;; Type check a closed abstract representation of an LC expression (define TypeCheck (lambda (are) (TC are (mt)))) ;; Type check an open abstract representation of an LC expression ;; are: an abstract LC expression ;; tenv: an environment that associates variables with types ;; result: the type of are in tenv ;; effect: error if the types don't work out (define TC (lambda (are tenv) (cond ((var? are) (lookup (var-name are) tenv (lambda () (error 'TC "free var: ~s" are)))) ((const? are) 'int) ((add? are) (Type= (TC (add-left are) tenv) 'int (add-left are)) (Type= (TC (add-right are) tenv) 'int (add-right are))) ((if? are) (Type= (TC (if-tst are) tenv) 'int (if-tst are)) (Type= (TC (if-thn are) tenv) (TC (if-els are) tenv) are)) ((app? are) (let ((funT (TC (app-rator are) tenv))) (unless (->? funT) (error 'TC "function type expected for ~s~n ~s inferred~n" (app-rator are) funT)) (Type= (->-domain funT) (argT (TC (app-rand are) tenv)) (app-rand are)) (->-range funT))) ((proc? are) (let ((ptype (proc-type are))) (make--> ptype (TC (proc-body are) (extend tenv (proc-param are) ptype))))) ((rec? are) (let* ((var-type (rec-type are)) (rtenv (extend tenv (rec-var are) var-type))) (Type= (TC (rec-rhs are) rtenv) var-type (rec-rhs are)) (TC (rec-body are) rtenv))) (else (error 'TC "impossible: ~s" are))))) ;; Compare two types, raise error if mismatched ;; recd: the type that was inferred for b ;; expected: the expected type (according to context) ;; are: an expression ;; result: recd if recd and expected are structurally identical types ;; effect: cal to error with are if the types don't match (define Type= (lambda (recd expected b) (if (equal? recd expected) recd (error 'Type= "expected: ~s; constructed: ~s~n for ~s" expected recd b))))Using the type checker we can formulate the following more concrete claim about LC (which is basically a formal version of idea 3):If
(TC M mt-env)
=t
and if(Eval M mt-env)
=V
, then(TC V mt-env)
=t
.Note: LC satisfies this but most so-called ``strongly typed'' languages don't. In C programs, for example, even a variable declared of some type isn't guaranted to contain a C value of that kind. This property is only true if we accept that C values are bit strings. Pascal and Ada have similar problems.
Typing Rules
In the previous section, we have formulated a type-checker as a program. It is possible to describe a type theory in a more language-independent fashion, and indeed, most modern presentations of type theories use this style. The basic idea of this form of presentation is to establish type judgements of the form
tenv |- M : typewith a formal proof system. In this judgement,tenv
is a type environment; the `turnstile' (|-
) is read ``proves'';M
is a term; the colon (:
) reads as ``has the type''; and the emphasized text is the derived type. (We will use emphasis to distinguish information generated, rather than specified.) We then have the following rules:
- We begin with an axiom that tells us we can extract information reposited in the environment:
tenv |- x : t if tenv(x) = t
- Next, we have an inference rule that shows us how we construct function types; this is where type assertions are introduced into the type environment:
tenv + [x : t] |- M : s ----------------------------------- tenv |- (lambda (x : t) M) : t -> s
- Finally, we show the typing of applications:
tenv |- F : t -> s tenv |- A : t ----------------------------------- tenv |- (F A) : sHence, inference rules produce proof trees. The text above the lines (called the antecedents) are judgements that need to be established using inference rules and axioms. When we use only axioms, we have finished our job. Notice that the rule for application requires two proofs to establish the antecedent, while the one for procedures requires only one.
cork@cs.rice.edu (adapted from shriram@cs.rice.edu)