COMP 311 Principles of Programming Languages (Fall 2009)

Section 6: Domains Supporting Recursive Definitions

A Thumbnail Sketch of Computability Theory

For the moment, let us confine ourselves to monolithic programming languages: languages that can only express partial functions mapping the natural numbers N (bitstrings) to the natural numbers. (We could restrict Jam to such a form for example by omitting all operations involving lists and banning functions as values.)

Consider any such programming language P. To qualify as a legitmate programming language, the programs must have a "surface syntax" representation consisting of sequences of symbols over a finite alphabet A. Can P express all total functions mapping N to N? The answer is no. We will present two proofs of this fact. The first proof is a simple counting argument. How many different total functions are there mapping N to N? An infinite number. How many programs are there in P? An infinite number. (We will assume programs in P can be arbitrarily long; otherwise, it is obvious that there aren't enough programs to compute all the total functions on the natural numbers.) But infinite sets are not necessarily comparable in size. You are presumably aware that there are far more real numbers than rationals (or integers), while the set of rational numbers and the set of integers are the same size (cardinality). Two infinite sets A and B are the same size if there can be put into one to one correspondence, i.e., if there exists a function f: A -> B that is that is one-to-one (for all x,y in A [f(x) = f(y) -> x = y]) and onto (for every z in B, there exists x in A such that f(x) = z).

Exercise: Show that the rationals and the natural numbers can be put into one-to-one correspondence. Hint: depict the set of rational as a two dimensional table of unbounded width and height and traverse the table along each finite length diagonal.

It is easy to show that the set of all programs P has the same cardinality (size) as the set of natural numbers. Simply sort the programs in P by length and then alphabetize them (using any ordering you choose on the alphabet A for P). Such an enumeration of programs places them in one to one correspondence with the natural numbers N.

Now consider how many functions that there are mapping N to N. We know that this set must be at least as big as the set of characteristic functions (functions from n to Boolean) for all subsets of the natural numbers (which is isomorphic to the power set of N). The characteristic function fS for subset S is defined by the equation

fS(n) = if n in S then 1 else 0

One of the most basic results of set theory is the fact the the power set of N is larger than N. It is not difficult to prove this fact by a diagonalization argument. We will present the proof for the natural numbers N but it easily generalizes to any infinite set S. Assume that the power set Pow(N) can be put into one-to-one correspondence with the natural numbers. Let S0, S1, S2, ... be an enumeration of Pow(N) that establishes that correspondence. Then define the set S' as {i in N | i not in Si}. In other words, natural number i belongs to S' iff it does not belong to Si.

Since S' in a member of Pow(N), it must appear in the enumeration S0, S1, ... somewhere, say Sj. Is j a member of S' = Sj? Examining this question leads directly to a contradiction. j is a member of S' <=> j is not member of S' <=> is not a member of Sj, which is a contradiction. Hence, no such enumeration S0, S1, ... can exist.

Now we have all of the machinery at hand required to prove that there are more total functions mapping N to N. The number of functions is at least as large as the power set of N which is strictly larger than N. But the set of programs in P is countable, has the same size as the natural Numbers N. Hence, there are more functions than programs.

A Diagonalization Construction of a Function that is Not Computable

We can use diagonalization to directly construct a function mapping N into N that is not computable.

Enumerate the programs of P in order of ascending length and alphabetically among programs of the same length. Let Pi be the partial function (mapping N to N) computed by the ith program in this enumeration. Consider a table listing all the functions Pi. This listing can be visualized as a two dimensional infinite table listing all possible programs on the vertical axis and and all possible inputs (elements of N) to each Pi on the horizontal axis:

    0  1  2  3  . . .
P0  x  x  x  x
P1  x  x  x  x
P2  x  x  x  x
.
.
.

Define P' as the function that returns Pi(i)+1 if Pi(i) terminates and 0 if Pi(i) diverges. If we identify 0 with false and non-zero values with true then P' reports whether or not the computation Pi(i) converges.

Can P' be computed by some program Pj in our enumeration of all programs? It is easy to prove that the answer is NO by contradiction. Assume j exists such that Pj = P'. Since P' terminates for all inputs, the computation P'(j) terminates. Hence, the computation Pj(j) terminates implying that P'(j) = Pj(j)+1 = P'(j)+1, a contradiction. Hence, P' is not a computable function.

Computation over General Domains

Most real programs are not monolithic. They incrementally produce outputs in response to incrementally gathered input information. This model of computation is more general than the model on which conventional computability theory is based. It is aptly characterized by the following slogan.

Slogan: Computation is successive approximation

Let us explore what properties domains (sets of data values) and functions on those domains must have to support recursive defintions. How can we construct the fixed-point of a function mapping a computational domain D into D? Why does this question matter? It shows the boundaries of what can be expressed computationally. nMuch of mathematics lies outside the reach of computation.

We begin by reviewing a standard definition from discrete mathematics.

A partial order is a set D together with a relation <= (which we will read as "approximates") on D (a subset of D x D) with the following three properties:

(reflexivity) For all d in D: d <= d.
(anti-symmetry) For all d1, d2 in D: d1 <= d2 and d2 <= d1 implies d1 = d2.
(transitivity) For all d1, d2, d3 in D: d1 <= d2 and d2 <= d3 implies d1 <= d3.

Some examples include:

2^N (the set of all subsets of the natural numbers N) under the subset ordering.
the set of finite alphabetic strings under the prefix ordering.
the natural numbers 0, 1, ... under the usual <= ordering.

A partial order does not have enough structure to support the general solution of fixed point equations. We need to add the two restrictions: chain-completeness and consistent-completeness. A chain C in a partial order (D,<=) is a (possibly empty) denumerable (countable) sequence c1, c2, ..., cn, ... such that

c1 <= c2 <= ... <= cn <= ...

An upper bound for a chain C in (D,<=) is an element d in D such that ci <= d for all ci in C. A partial order (D,<=) is chain-complete (or simply complete) iff every chain C has a least upper bound. (An upper bound c* for C is least if c* approximates every upper bound for C.) We typically abbreviate the term complete partial order as cpo. Note that chain can be empty, implying that every cpo has a least element. We call this least element "bottom" and typically denote it bot. As we will see, bot corresponds to divergence.

Of the three examples of partial orders given above, only the first one is complete. In the partial order of finite alphabetic strings, a chain of the form

"a" <= "aa" <= "aaa" <= ...

has no upper bound, much less a least upper bound. Similarly, in the partial order (N,<=), the chain

0 <= 1 <= 2 <= ...

has no upper bound. In essence, these spaces are missing the "limit" points of chains of approximations.

The notion of cpo described above is still too broad to characterize potentially domains for incremental computation. The problem is that such domains do not have enough finite approximations to express all possible finite snapsnots of the results of computations. Consider the following cpo (nicknamed "home-plate")

      o       o
      |\     /| 
      | \   / |
      |  \ /  |
      |   X   |
      |  / \  |
      | /   \ |
      |/     \|
    A o       o B
       \     /
        \   /
         \ /
          o

where each circle ("o") represents a domain element. The two elements A and B are labeled because they have an upper bound but no least upper bound. A computation yielding a result in this domain could describe an answer by outputting both A and B since these elements are "consistent", i.e. have an upper bound. But what element of the domain would would be the meaning of such computation if it produces no additional output? There are two distinct upper bounds for {A,B}. We can rule out this problem by forcing a domain D to include least upper bounds for all bounded subsets (subsets such that every member is less than a particular element of D). In essence, this property asserts that given any set of approximations to the answer to a computation, there is unique element of the domain that summarizes all of this approximation information.

The home-plate domain fails this test because the set {A,B} is bounded but has no least upper bound.

We formalize this property as consistent completeness. A cpo [D,<=] is consistently complete iff every bounded subset S of D has a least upper bound. A subset S of D is bounded iff there exists d in D such that for all s in S, s <= d.

So we assert that a domain for incremental computation is a consistently complete cpo. In fact, we will use the term domain as a synonym for consistently complete cpo.

Topological Properties of Computable Functions

In the preceding section, we identified the topological (approximation structure) properties that a data domain for incremental computation must satisfy. Now let us examine some of the properties that any computable function on such a domain must satisfy. First, any function performing incremental computation on an input from domain A producing output in domain B must be monotonic: for inputs a1, a2 in A, a1 <= a2 implies f(a1) <= f(a2).

Example: Consider the domain of lazy streams of integers. Let cons' denote the stream constructor for this domain akin to Scheme cons and let bot denote divergence. Define the function f on lazy streams of integers as the function that maps cons'(1,bot) to 1, cons'(1,null) to 0, and everything else to bot. This function is not monotonic because bot (the meaning of a divergent term) approximates null yet f(cons(1,bot)) does not approximate f(cons'(1, null)). (If using bot in program text troubles you, simply use the lambda-expression Omega instead.)

This property asserts that as a function gathers more information about its input it cannot withdraw information about its output that it has already produced!

Monotoncity is not the only topological constraint on incrementally computable functions. Consider a function f that on infinitely streams of integers that maps every finite approximation

cons'(n1, cons'(n2, ... cons'(nk, bot) ...))

to bot, but maps every infinite stream

cons'(n1, cons'(n2, ... cons'(nk, ...) ...))

to true. A computable function cannot produce this behavior because it can never determine that its input is infinite; all it sees are progressively better finite approximations. For this reason, computable functions must be continous: given any chain of approximations a0 <= a1 < = ..., f( lub{ai | i = 0,1, ...}) = lub{ f(ai) | i = 0,1, ...}. In other words, the behavior of a continuous function on an infinite element is simply its cumulative behavior on the finite approximations to the infinite element.

Example Consider a function f on lazy streams of integers that maps all elements except

zeroes = cons'(0, cons'(0, ... (cons'(0, ...))))

(the infinite stream of zeroes) to bot and maps zeroes to null.

This function is clearly monotonic since it differs from bottom only on a maximal element (zeroes). Yet it is not continuous because the function is bottom for every element of the the chain of finite approximations to zeroes (finite streams of zeroes ending in bot). By the definition of continuity, the function would have to yield bot on zeroes to be continuous at zeroes, but it doesn't.

Common cpos in Programming Languages

The domain of data values supported by a programming language forms a cpo. Assume that we prohibit functions as arguments and return values in LC. Then the only kinds of semantic objects in the language are integers and unary functions mapping integers to integers. We can view this domain as a cpo as follows:

The integers..., -1, 0, 1, ... form a flat cpo where all distinct integers m, n are incomparable, i.e., m <= n implies m = n. To satisfy both the chain completeness and consisteny completeness properties (since both definitions apply to the empty subset), we must add an extra element bot such that bot <= n for every integer n. The element bot represents a non-terminating computation.
The strict partial functions mapping the flat domain of integers to itself are ordered as follows: F <= G iff F is a subset of G when both F and G are interpreted as graphs, sets of ordered pairs ignoring pairs (a, b) with return value b = bot. A function mapping cpo A to cpo B is strict iff it maps bot in A to bot in B. The domain of strict partial functions mapping the flat domain of integers instead itself is isomorphic to the familiar set of partial functions on the integers in conventional mathematics.

The everywhere undefined function has the empty set as its graph provided we ignore pairs with return value bot. This function obviously approximates all other functions. Since the degenerate pair (bot, bot) must belong to the graph of every strict partial function, we often omit this pair when describing the graph of a function. Hence, we often say that the graph of the partial function mapping 0 to 0 and diverging everywhere else is {(0,0)}. Integers and functions are incomparable but there is a divergent element in both domains. We typically use the name bot for the least of a domain without bothering to qualifying it with the name of the coresponding domain. In fact, we can identify the bottom elements of all domains and assert that they are all denoted by a single special element bot. Is bot the same as a function that diverges everywhere? No! An expressing "defining" a function can diverge, which is different from defining the everywhere divergent function. (In Scheme, the latter produces a closure which can be identified using the {\tt function?} primitive. It diverges only when it is applied to an argument.)

The Meaning of Recursive Definitions

Consider the function definition written in Scheme:

(define fact 
  (lambda (n)
    (if (zero? n)
        1 
        (* n (fact (sub1 n))))))

What meaning in the space of strict partial functions mapping the flat domain of integers to itself should we assign to fact.

How does the meaning of the rhs (right-hand-side)

(lambda (n)
  (if (zero? n)
      1 
      (* n (fact (sub1 n)))))

of the preceding definition change as the meaning of the recursive call on fact changes, i.e. how does the function

(lambda (fact)
  (lambda (n) 
    (if (zero? n)
        1 
        (* n (fact (sub1 n))))))

behave?

Let's explicitly bind the preceding function to the name FACT:

(define FACT 
  (lambda (fact)
    (lambda (n) 
      (if (zero? n)
          1 
          (* n (fact (sub1 n))))))

Assume that the recursive call on fact always diverges when called. Let's define such a function:

(define EMPTY (lambda (n) OMEGA)) (define OMEGA ((lambda (x) (x x)) (lambda (x) (x x))))

Given the empty function EMPTY, FACT(EMPTY) constructs the partial function {(0,1)} (that diverges everywhere except for returning 1 when the input is 0). All computable functions including FACT are monotonic: if x <= y then FACT(x) approximates FACT(y). Given this property and the property that FACT is continuous (all computable functions are continuous), we can deduce (by repeatedly appealing to the monotonicity of FACT):

EMPTY <= FACT(EMPTY) <= ... FACT(...FACT(EMPTY)) <= ...

The preceding sequence is clearly a chain. Hence, it must have a least upper bound which is which we write fix(FACT) because it is least fixed fixed-point of the call-by-name functional FACT corresponding to FACT. This function is the meaning of fact determined by the recursive definition above. In general, every computable function mapping a cpo D to D has a least fixed-point. Hence, we can interpret any recursive definition as the least fixed-point of the corresponding function.

Call-by-Name vs Call-by-Value Fixed-Points

We cannot express the functional corresponding to the fact program in Scheme because all function abstraction in Scheme is restricted to call-by-value. We do not want (FACT OMEGA) to diverge, because we want it capture the meaning of

(lambda (n)
  (if (zero? n)
      1 
      (* n (fact (sub1 n)))))

when the recursive call on fact is replaced by OMEGA, which is NOT a value. Hence, we want the outer lambda abstraction in

(lambda (fact)
  (lambda (n) 
    (if (zero? n)
        1 
        (* n (fact (sub1 n))))))

to be interpreted using call-by-name. We finessed this issue in building our ascending chain by starting with the function EMPTY which is a value, namely the strict function that diverges everywhere. This is not the same as a divergent term like OMEGA. In Scheme, evaluating EMPTY terminates; evaluating OMEGA does not!

Note that the least fixed point of the call-by-value lambda abstraction

(lambda (fact) ;; always diverges if fact diverges
  (lambda (n) 
    (if (zero? n)
        1 
        (* n (fact (sub1 n))))))

is bot! The strict factorial function is another fixed-point but it is not least!

The Full Domain of LC

We made the simplifying assumption that functions are never passed as arguments or returned as values in describing the domain of LC above. What happens if we relax these restrictions? Let D denote (stand for) the domain of LC. It must satisfy the following equation

D = Zbot + (D => D)

where Zbot is the flat domain of integers (the conventional integers augmented by bot), + means the standard union operation except bottom elements are coalesced, and D => D means the set of all strict continuous functions mapping D into D. (A function f mapping the cpos D1, D2, ... Dn into the cpo D0 is strict iff f(d1,d2,...,dn) = bot if any di = bot.)

 
This equation can be interpreted as a recursive definition of the domain D. 
  We can solve this equation by constructing a functional on domains and taking 
  its least fixed-point. But a lot of technical machinery must be developed. First 
  we have to show that the set of domains forms a cpo. Then we have show that 
  the + and -> operations (functions) on domains are 
  continuous. This deeper inquiry into the semantics of LC is the subject matter 
  of Comp 511.
 Back 
  to course website