Program Engineering: Types I

Types I, Comp 312

What is a Type?

One of the central tenets of software engineering is that of early error discovery. Many programming languages support this idea with a type system. Roughly speaking, a type is a name for a collection of values and a type declaration is simply an invariant assertion asserting that a program expresssion (variable, function application, etc.) is constrained to return a value of the specified type. A type system encourages the programmer to think about the type of the value that each expression in the program produces.

Suppose the expression

(if <big, ugly expression>
  (5 6)
  a-nice-value)

is embedded deep in a program. If program tests never force the evaluation of <big, ugly expression> to return true, the error in the then branch is never discovered. However, it may still be the case that some input will eventually force the evaluation of (5 6), which will then generate a run-time error. This error could clearly have been avoided if the programmer and/or the language implementation had flagged the value 5 as something that is inappropriate for the function position of an application.

Idea 1: Types are names for sets of data values.

Idea 2: The valid sets of ``input values'' for each program operation can be described in terms of types.

A functional programming language based on the functional core of Scheme is a good pedagogic language for studying type systems. We define the syntax of the language LC by the production rule

M ::= x | c | (M M) | (lambda x M)


where x is any variable name in a given set of
identifiers V and c is any constant is a give set of
constants C.  We will assume that C includes
all integer constants, the if-then-else function (appropriately
curried), all boolean constants, basic integer and boolean functions
like +, <, and and.

We can assign all of the constants in C types in
the set generated by the following inductive definition:

Type = int | bool | Type -> Type

The infix operator -> is called a type constructor, 
since it builds a more complex type out of simpler ones.  The names
int and bool respectively
denote the set of integer values and the set of boolean values.
The constructed type A -> B denotes the set of
functions from set A into set B.

In the set of constants C, all integer constants have type
int, all boolean constants have the type bool,
and the basic functions on integer and booleans have their obvious
curried fucnction (arrow) types.  For exmaple, + has type

int -> (int -> int)


In contrast, the if-then-else operation has many different
types since the consequent and alternative expressions can be take
from any type.  For the sake of clarity and simplicity,
we will restrict the usage of if-then-else to the type
bool -> (int -> (int -> int))

Hence every constant in LC has a unique type.

Exercise: What is C's syntax for types?  In particular, how
does it represent function types?





Once we have names for sets of values, we can try to determine to
which set the result of an expression belongs and see whether this
result makes sense in the given context.



Idea 3: Tell me what the types of the variables in an
expression are and I'll tell you what the type of the expression is.



In LC, as in every other language, we have free and bound variable
occurrences.  If we attach types to every binding occurrence of a
variable, we have clearly covered all bound occurrences.  This idea
suggests a small modification to LC's syntax:

M :== var | (lambda var type M) | (M M) | n | (+ M M)




But what do we do about expressions with free variables?  For those we
must assume that we are given their types.  The association of a free
variable with a type is sometimes called a type context or a
type environment.  Given any expression and a type
environment that covers all of its free variables, we can determine
the type of the expression and can thus predict what kind of result it
will produce.



Here is a type checker for LC, using a straightforward natural 
recursion formulation.

;; Type check a closed abstract representation of an LC expression
(define TypeCheck
  (lambda (are)
    (TC are (mt))))

;; Type check an open abstract representation of an LC expression
;; are: an abstract LC expression
;; tenv: an environment that associates variables with types
;; result: the type of are in tenv
;; effect: error if the types don't work out
(define TC
  (lambda (are tenv)
    (cond
      ((var? are) (lookup (var-name are)
		    tenv
		    (lambda () (error 'TC "free var: ~s" are))))
      ((const? are) 'int)
      ((add? are) 
	(Type= (TC (add-left are) tenv) 'int (add-left are))
	(Type= (TC (add-right are) tenv) 'int (add-right are)))
      ((if? are)
	(Type= (TC (if-tst are) tenv) 'int (if-tst are))
	(Type= (TC (if-thn are) tenv) (TC (if-els are) tenv) are))
      ((app? are)
	(let ((funT (TC (app-rator are) tenv)))
	  (unless (->? funT)
	    (error 'TC "function type expected for ~s~n  ~s inferred~n"
	      (app-rator are) funT))
	  (Type= (->-domain funT)
	    (argT (TC (app-rand are) tenv))
	    (app-rand are))
	  (->-range funT)))
      ((proc? are)
	(let ((ptype (proc-type are)))
	  (make--> ptype
	    (TC (proc-body are)
	      (extend tenv (proc-param are) ptype)))))
      ((rec? are)
	(let* ((var-type (rec-type are))
		(rtenv (extend tenv (rec-var are) var-type)))
	  (Type= (TC (rec-rhs are) rtenv) var-type (rec-rhs are))
	  (TC (rec-body are) rtenv)))
      (else (error 'TC "impossible: ~s" are)))))

;; Compare two types, raise error if mismatched
;; recd: the type that was inferred for b
;; expected: the expected type (according to context)
;; are: an expression 
;; result: recd if recd and expected are structurally identical types
;; effect: cal to error with are if the types don't match
(define Type=
  (lambda (recd expected b)
    (if (equal? recd expected) recd
	(error 'Type= "expected: ~s; constructed: ~s~n   for ~s"
	  expected recd b))))


Using the type checker we can formulate the following more 
concrete claim about LC (which is basically a formal version 
of idea 3): 



If (TC M mt-env) = t and if (Eval M
mt-env) = V, then (TC V mt-env) =
t.



Note: LC satisfies this but most so-called ``strongly typed''
languages don't.  In C programs, for example, even a variable declared
of some type isn't guaranted to contain a C value of that kind.  This
property is only true if we accept that C values are bit strings.
Pascal and Ada have similar problems.







Typing Rules



In the previous section, we have formulated a type-checker as a
program.  It is possible to describe a type theory in a more
language-independent fashion, and indeed, most modern presentations of
type theories use this style.  The basic idea of this form of
presentation is to establish type judgements of the form

tenv |- M : type


with a formal proof system.  In this judgement, tenv is a
type environment; the `turnstile' (|-) is read
``proves''; M is a term; the colon (:) reads
as ``has the type''; and the emphasized text is the derived
type.  (We will use emphasis to distinguish information generated,
rather than specified.)  We then have the following rules:





We begin with an axiom that tells us we can extract
    information reposited in the environment:
    
        tenv |- x : t    if tenv(x) = t
    



Next, we have an inference rule that shows us how we
    construct function types; this is where type assertions are
    introduced into the type environment:
    
              tenv + [x : t] |- M : s
    -----------------------------------
    tenv |- (lambda (x : t) M) : t -> s
    



Finally, we show the typing of applications:

        tenv |- F : t -> s    tenv |- A : t
    -----------------------------------
             tenv |- (F A) : s
    





Hence, inference rules produce proof trees.  The text above
the lines (called the antecedents) are judgements that need
to be established using inference rules and axioms.  When we use
only axioms, we have finished our job.  Notice that the rule for
application requires two proofs to establish the antecedent, while the
one for procedures requires only one.


cork@cs.rice.edu
(adapted from shriram@cs.rice.edu)