Lecture 14: Polymorphism

 

 

 

Polymorphism

In our last lecture, we introduced two key concepts: union types and recursive types. We also introduced the datatype mechanism, which can express both these concepts. With datatype, we can do away with some of the types we had introduced into TLC earlier, such as ilist, which can be written (using a more compact concrete syntax that omits selector names) as

datatype ilist = n () + c ( int , x )

Puzzle: Determine what these declarations represent:
datatype x = n () + c ( int , x )
in
  datatype y = ni () + ci ( int -> int , y )
  in
    ...
    datatype z = nz ( int ) + cz ( int -> z )
    ...

Explicit Polymorphism

In TLC with datatype definitions, we can define both lists of int and lists of (int -> int) as follows

(datatype list_int
  ([null_int null_int?]
   [cons_int cons_int? (int first_int) (list_int rest_int)]

 (datatype list_int->int
  ([null_int->int null_int->int?]
   [cons_int->int cons_int->int? (int->int first_int->int) (list_int->int rest_int->int)])
 
  ...))

Given these datatype definitions and the addition of Scheme-like if expressions to TLC, we can write a procedure that maps its argument over a list of integers, returning a list of integers:

(define map
  (lambda (f (int -> int) l list_int)
    (if (null_int? l)
      null_int
      (cons_int (f (first_int l)) (map f (rest_int l))))))

Similarly, we can write a procedure that maps its argument over a list of functions...

(define map
  (lambda (f ((int -> int) -> int) l list_int->int)
    (if (null_int->int? l)
      null_int
      (cons_int (f (first_int->int l)) (map f (restr_int->int l))))))

Of course, these two procedures are identical except for the types specified for the arguments. In Scheme, we have only one map procedure that subsumes all of these; in languages like Pascal or TLC, we need to have one for each argument type.

If we look carefully at these declarations, we notice that only two of the types actually matter: the type of the elements in the list argument and the return type of the function argument. Everything else in the declaration is implied by these two parameters. Hence, we could rewrite map using parameterized types as follows:

(define map
  (lambda (alpha)
    (lambda (beta)
      (reclet g ((beta -> alpha) -> (list beta) -> (list alpha))
	(lambda (f (beta -> alpha))
	  (lambda (l (list beta))
	    (if (null_beta? l)
	      null_alpha
	      (cons_alpha (f (first_beta l))
		((g f) (rest_beta l))))))
	g))))

In this declaration, the arguments being passed to map are types; hence, we cannot write this procedure in TLC. So we need another extension to the type language to handle these kinds of abstractions. We call this new language XPolyLC.

  1. We can add a new form of abstraction. We denote it with the keyword Lambda, which is similar to lambda except that it binds types, not values, to identifiers. Using Lambda, we can write map as
    (Lambda (alpha Omega)
      (Lambda (beta Omega)
        (reclet g ((beta -> alpha) -> (list beta) -> (list alpha))
          (lambda (f (beta -> alpha))
    	(lambda (l (list beta))
    	  (if (null_beta? l)
    	    null_alpha
    	    (cons_alpha (f (first_beta l))
    	      ((g f) (rest_beta l)))))))))
    
    where the keyword Lambda refers to abstraction over types rather than abstraction over data values of some type. Recall that in TLC, we have to annotate each binding occurrence with its type. But what type can we assign to types themselves? We use Omega, which represents the set of all types. Since the ``type'' of all arguments bound by Lambda will always be Omega, we shall henceforth leave this declaration implicit. This is the introduction rule for type abstractions.
    Now consider this definition of map:
    (define map
      (Lambda (alpha)
        (Lambda (beta)
          (lambda (f (beta -> alpha))
    	(lambda (l (list beta))
    	  (if (null_beta? l)
                null_alpha
                (cons_alpha (f (first_beta l))
                  ((map f) (rest_beta l)))))))))
    
    When we perform the recursive call to map, we need to pass to pass the types first to instantiate map. However, we do not have any facilities for passing types as arguments, since they are not values.
  2. Hence, we introduce an elimination rule for type abstractions. This is done in the form of a new kind of application, Tapp (for ``type application''). Tapp invokes an objected created with Lambda with a type as an argument. Hence, we would write the recursive call to map above as
                  (((Tapp (Tapp map alpha) beta) f) (rest_beta l))
    
    with the rest of the code remaining unchanged.

In TLC, and indeed in many programming languages, a programmer provides type annotations at binding instances of variables, but leaves the type of individual expressions to be inferred (computed) and checked by the language implementation. In XPolyLC, we have added two new kinds of expressions, type introductions and eliminations, so we examine what the types of these expressions are.

  1. What is the type of the (lambda (l (list beta)) ...) expression in either definition of map?

    ((list beta) -> (list alpha)).

  2. What is the type of the (lambda (f (beta -> alpha)) ...) expression in either definition of map?

    ((beta -> alpha) -> type of (1)).

  3. What is the type of (Lambda (beta) ...) in the latter definition of map?

    The type is (beta -> type of (2)), for all possible values of beta. Hence, we write this as (forall beta . (2)).

  4. Finally, what is the type of (Lambda (alpha) ...)?

    This has type (alpha -> type of (3)), for all possible values of alpha. We write this type as (forall alpha . (3)).

In reality, we need to perform a type application to instantiate each primitive as well. Hence, the resulting code looks like

(define map
  (Lambda (alpha)
    (Lambda (beta)
      (lambda (f (beta -> alpha))
	(lambda (l (list beta))
	  (if ([Tapp null? beta] l)
	    [Tapp null alpha]
	    ([Tapp cons alpha] (f ([Tapp first beta] l))
	      (([Tapp [Tapp map alpha] beta] f) ([Tapp rest beta] l)))))))))

Hence, alpha and beta range (abstract) over types in the same manner that lambda-bound identifiers range (abstract) over values. This property of a type system, wherein ``type variables'' are used to abstract over types, is called polymorphism. This new form of types is called a type schema, since it represents a framework which can be instantiated with several different types. This kind of type system is said to be explicitly polymorphic, since the programmer is required to manually specify each instance of type abstraction, and to perform the appropriate type applications.

Implicit Polymorphism

While explicit polymorphism adds considerable power to a programming language, it is notationally clumsy because all type instantiations (applications) must be explicitly performed in the code. (Generic Java requires explicit instantiation of polymorphic (generic) classes but not of polymorphic methods.) To eliminate this problem, we allow the programmer the following freedom: every type declaration required in TLC can be left blank. We then design typing rules that will attempt to deduce an appropriate type from context such that the resulting expression will pass the type checker.

Consider the following example. If we have

(lambda (x _) x)

what can we write in the blank? We could certainly use int; likewise, we could use bool. In fact, any well-formed type we write in the blank will satisfy the type checker.

Now consider a typing judgement for let:

Tenv |- Exp : t    Tenv + [f : t] |- Bexp : type
------------------------------------------------
       Tenv |- (let f t Exp Bexp) : type

Say the Exp is (lambda (x _) x). Then how many ways are there to type it and use it in the body? We can only use whatever was put in the place of _, so the resulting identity function can only be used on objects of the one type t above. For instance, it could not be used to type

(let ((id (lambda (x _) x)))
  (if (id true)
    (id 5)
    (id 6)))

since the closure is applied to both booleans and integers. Hence, this is not a convenient typing rule. Instead, the SML programming language uses a modified judgement for let:

  Tenv |- Bexp [f / Exp] : type
---------------------------------
Tenv |- (let f _ Exp Bexp) : type

This rule literally copies the Exp expression through Bexp; when applied to the code fragment we examined earlier, we end up typing the expression

(if ((lambda (x _) x) true)
  ((lambda (x _) x) 5)
  ((lambda (x _) x) 6))

As can be seen, everything -- including the programmer's demand to deduce a type -- is copied by this rule. Hence, an independent deduction can be made at each location.

The last rule is easy to implement, though it can be expensive in terms of the time taken for the typing phase. But, how do we implement type deduction, especially since there is an unbounded number of types? The answer is that we create a new type, assuming no properties for it, and based on the constraints derived from typing the body M, we derive the "deduced" type.

In summary, polymorphism is the ability to abstract over types. Implicit polymorphism is the ability of a type system to assign many different types to the same phrase without the need for explicit type declarations. Several languages, including SML, Haskell and Clean, provide type inference ("computing the most general possible type for each expression") with implicit polymorphism for let (and possibly other binding constructs).

Exercise: What is the connexion between explicit and implicit polymorphism?

Back to course website