COMP 311 Principles of Programming Languages (Fall 2010)

Programming Assignment 6:
Removing Recursion and Symbolic Variables from Jam

Assigned: Monday, November 8, 2010
Due: 12:00 noon, Monday, November 22, 2010

Files

No supporting code. Use your own interpreter from Assignment 5, Step 1 (i.e. untyped Jam in the value/value and value/need modes).

Some Simple Test Inputs

Overview

Preparation Create a programs/6 directory within your comp311 directory. All of your files for this assignment should be stored in the programs/6 subdirectory. Do not put files in subdirectories below programs/6.

Teamwork You are required to do this assignment and all other programming assignments using pair programming. When you submit your assignment, indicate the composition of your team (names, student ids, and email addresses) in your README file.

If you cannot find a partner and would like one, send an email to comp311@rice.edu and we will try to find one for you. Teams of more than two members are not permitted.

Task Your assignment is to transform a dialect of untyped Jam to continuation passing style (CPS) and to convert the Jam environment representation to use static distance coordinates. The assignment should be done in three separate phases: (i) modifying your interpreter for Step 1 of Assignment 5 by restricting the interpreter to eager evaluation and splitting the recursive let construct into separate recursive letrec and non-recursive let constructs; (ii) modifying the parser to rename variables to eliminate the shadowing of variables by nested variable declarations (map parameters, let bindings, reclet bindings), and (iii) transforming Jam programs to CPS, leaving the interpreter from phase (i) unchanged; and (iv) the conversion of Jam program text to static distance coordinate form, which must be supported by modifying the environment representation in the Jam interpreter.

This is a long, challenging assignment and will be worth 200 points instead of 100 points.

Phase I As a first phase of this assignment, you will modify your Step 1 parser and interpreter from Assignment 5 to support only eager evaluation and split recursive let into two constructs: (i) letrec, which is recursive let with right hand sides limited to map constructions, and (ii) let, which is ordinary non-recursive let abbreviating the application of a map to the right hand sides. These modifications to the Jam language simplifies the transformation required to implement Phase II. The language is to be changed at the level of the grammar; here are the relevant changes to the grammar:

Expressions

Exp         ::= ...
              | let Def+ in Exp
              | letrec MapDef+ in Exp
              | Map
              | ...
Map         ::= map IdList to Exp

Definitions

Def    ::= Id := Exp ;
MapDef ::= Id := Map ;

In your Java implementation, the only public evaluation method in the Interpreter class must be called eval() instead of the former name eagerEval() used in Assignment 5.

Note In extending your parser to support letrec, remember that your code must support printing ASTs including letrec constructions as strings. Hence, in your Java code, you must define toString() in the letrec AST node class.

Phase II Modify your parser to prevent shadowing variable names by converting the name of each Jam variable from x to x:d where d is the lexical depth of the declaration of x. If x is free (not allowed in legal programs) then it has lexical depth 0. If it occurs inside one level of lexical nesting, it has lexical depth 1. For example, the program

(map x to x)(7)

becomes

(map x:1 to x:1)(7)

after renaming. The renamed variables cannot be confused with existing variable names because : is not a legal character in variable names read by the parser.

In your Java code, add a method

public AST unshadow();

in the Interpreter class that performs this transformation on the Jam AST associated with this.

Since the unshadow operation is not bundled with either parsing or syntax checking, you will subsequently be able to support both ordinary interpretation (as performed in Phase I above) and interpretation after converting a program to CPS form.

The unshadowing transformation permits let constructs to be re-interpreted as let* constructs without affecting program semantics. The correctness of our rules for CPS transformation hinges on this identity.

Phase III Write a postprocessor (function from abstract syntax to abstract syntax) for your parser that transforms a Phase I Jam program to equivalent CPS form. Specifically, given a program M, your postprocessor will generate the program Cps[map x to x,M0] where M0 is M converted to unshadowed form and Cps is a binary function mapping Jam program text to Jam program defined by the transformation rules given below. These rules are a loosely based on the exposition in Friedman, et al., Chapter 8.

These rules presume that variables have been renamed to prevent “holes in scope” (nested variables with the same name) as described in Phase II. They will not work correctly on programs that shadow variable names without Phase II being performed first.

For the purposes of this assignment, we will consider operator applications (both unary and binary) as syntactic sugar for applications of corresponding primitive operations. Hence operator applications are treated just like primitive applications.

In your Java code, formulate the CPS postprocessor as a method

public AST convertToCPS()

in your Interpreter class.

These interfaces are important because we will use them to test your code.

You must also provide a new method/function for performing interpretation of the CPSed program. In your Java code, you must add a method

public JamVal cpsEval();

to the Interpreter class that converts the associated program to CPS and then interprets the transformed program.

You can test that your implementation of the CPS transformation preserves the meaning of programs in Java by comparing the results produced by eval() and cpsEval().

To state the CPS transformation rules, we need to introduce a few technical definitions. Study them until you thoroughly understand them.

A Jam application E(E1, ...,En) is primitive if E is a primitive Jam function. Recall that we are interpreting operator applications as syntactic sugar for applications of corresponding primitive operations. So an application is primitive if and only if the rator of the application is either a primitive function or an operator. For example, the applications first(append(x,y)) and square(x) * y are both primitive while the applications square(4) and append(x,y) are not.

A Jam expression E is simple if and only if all applications except those nested inside map constructions are primitive, i.e. have a primitive function or operator as the rator. For example,

let x := first(a) * b * first(c);
Y := map f to let g := map z to f(z(z))) in g(g);
in cons(x, cons(Y, null))

and

x+(y*z)

are both simple. In contrast,

f(1)

and

let Y := map f to let g := map z to f(z(z))) in map x to (g(g))(x);
in Y(map fact to map n to if n=0 then 1 else n*fact(n-1))

are not simple because f is not primitive and Y is not primitive.

The following rules define two syntactic tranformers (functions) on Jam program text: the binary transformer Cps : Jam * Jam -> Jam and the unary transformer Rsh : Simp -> Simp, where Jam is the set of Jam expressions and Simp is the set of simple Jam expressions (Rsh stands for “reshape”). The binary transformer Cps[k,M] takes a Jam expression k denoting a unary function, and an unshadowed Jam expression M as input and produces a tail-recursive Jam expression with the same meaning as k(M).

The unary transformer Rsh is a “help” function for Cps that take an unshadowed simple expression as input and adds a continuation parameter to the map expressions and function constants embedded in simple expressions. Rsh also adjusts applications of the arity primitive function to ignore the added continuation argument.

In the following rules, S, S1, S2, ... denote simple Jam expressions; k, A, B, C, E, E1, E2, ..., T denote arbitrary Jam expressions; x1, x2, ... denote ordinary Jam identifiers, and v, v1, v2, ... denote fresh Jam identifiers that do not appear in any other program text. The variable names x, y, and k denote themselves.

Definition of Cps The following clauses define the textual tranformation Cps[k, M]:

Definition of Rsh The helper transformer Rsh[S] is defined by the following rules:

For the purposes of testing your programs we require the following standardization. The top-level continuation must have exactly the syntactic form

using the variable name x. In some transformations, you must generate a new variable name. For this purpose, use variable names of the form :k where k is an integer. These name cannot be confused with the names of variables that already exist in the program. The sequence of variable names generated by your CPS tranformer must be :0, :1, :2, ... so that your CPS tranformer has exactly the same behavior as our solution. Note that you must transform a program by making the leftmost possible reduction given that match variables S and E can only match raw program text (any embedded calls on Cps and Rsh must have already been reduced).

Phase IV As the fourth part of the assignment, you will write another processor for Jam abstract syntax that transforms conventional Jam abstract syntax into static distance format. Your static distance coordinates should print as [0,0], [0,1], [1,0], [1,1], and so on, or [x,y] in general, where x represents static distance and y the parameter index. Inside let and map statements, the number of definitions or parameters should be printed out as [*1*], [*2*], [*3*], and so on.

in the Interpreter class that performs this transformation on the Jam AST associated with this.

You must also write a new interpreter for static distance format (sharing as much existing code as possible) that represents environments as lists of activation records where activation records are represented as arrays in Java.

to the Interpreter class. The method SDEval() converts the program associated with this to static distance format and then interprets it. The method SDCpsEval() converts the program first to CPS and then to static distance format and interprets the result.

Hint In Java, it is tempting to try to refactor the AST composite hierarchy into a generic composite hierarchy parameterized by the form of variables (symbols or static distance coordinates) and environments (lists of binding pairs or lists of activation records) so that the distinctions between the two program representations are reflected in the typing of program text. While such a refactoring is possible, it is massive since every AST class (even including the constant classes) must be modified and every visitor class that processes ASTs must be modified. It is much easier (albeit less precise in the typing of program expressions) to simply add some new subclasses to the AST hierarchy that are static distance variants of the existing AST classes. Of course, this code requires some type casts. Note that static distance code in this representation is also conventional code since the static distance information is added to the conventional representation (which still has symbolic variable names).

Testing You are expected to write unit tests for all of your non-trivial methods or functions. For more information, refer back to Assignment 1.

Extra Credit (20 points)

Extend the Jam source language to include the new construct

Exp ::= ... | letcc x in M

The new construct letcc x in M binds the identifier x to the current continuation, and evaluates M in the extended environment. A continuation is a closure of one argument, reshaped to take an auxiliary continuation argument (like all other closures after CPS) which it discards. Since continuations are ordinary values, they can be stored in data structures, passed as argments, and returned as values.

The letcc construct is only supported in the interpreters that perform CPS conversion on the code. The conventional interpreters abort execution with an error if they encounter a use of letcc. To perform CPS conversion on program containing letcc, we extend our rules for CPS conversion as follows.

First, a Jam expression E is simple if and only if all occurrences of the letcc construct and non-primitive applications appear nested within map constructions.

Second, we add the following clause to the definition of the Cps syntax transformer:

Testing and Submitting Your Program

The directory tests contains the Java test file Assign6Test.java consisting of some very simple unit tests for your program. You can also use the test file Assign4Test.java as basis for creating test files for this assignment.

Your program directory should only contain the files that you want to submit. We will subsequently provide you with test scripts similar to the ones we provided for Assignment 4. Running these scripts will alert you to any problems with your directory configuration. We might also place a few addtional test programs in the directory tests.

Make sure that your program passes the sample unit tests and works with our grading script before submitting it. Create a README file in the your directory program/6 that

contains nothing but the user names of the team members in lines 1 and 2 and a blank line as line 3,
outlines the organization of your program, and
specifies what testing process you used to confirm the correctness of your program.

Make sure that your README file is structured correctly:

It has to be named README (all upper case, no .txt or anything)
It has to be in the assignment's "root directory", e.g. ~/comp311/programs/6
The first two lines should only contain your usernames and nothing else. The third line should be blank (if you're working alone, leave lines 2 and 3 blank), e.g.:

mgricken
dmp

This is where the readme text starts.

Any test data files for your program must be stored in the programs/6 directory. Do not store anything in subdirectories below programs/6.

Each procedure or method in your program should include a short comment stating precisely what it does. For routines that parse a particular form of program phrase, give grammar rules describing that form.

Please make sure to remove or disable all debug output before submitting. Excessive use of print statements considerably slows your interpreter and thus testing your project down. If your interpreter generates a timeout error because debug output slows it down, you may not get credit for the test cases that took too long.

To submit your program, make sure that everything that you want to submit is located in the directory comp311/programs/6 on CLE@R and type the command

/coursedata/courses/comp311/bin/turnin311 6

from within the comp311/programs directory.

The command will inform you whether or not your submission succeeded. Only submit one copy of your program per team. If you need to resubmit an improved version your program, submit it from the same account as the original so that the old version is replaced.

Back to course website

Assigned: Monday, November 8, 2010 Due: 12:00 noon, Monday, November 22, 2010

Files

Overview

Extra Credit (20 points)

Testing and Submitting Your Program

Assigned: Monday, November 8, 2010
Due: 12:00 noon, Monday, November 22, 2010