COMP 311 Principles of Programming Languages (Fall 2010)

Programming Assignment 7:
Machine Level Representation and Management of Jam Data Structures

Assigned: Friday, November 19, 2010
Due: 11:59 PM, Friday, December 3, 2010

Files

No supporting code. Use your own interpreter from Assignment 6.

Overview

Preparation Create a programs/7 directory within your comp311 directory. All of your files for this assignment should be stored in the programs/7 subdirectory. Do not put files in subdirectories below programs/7.

Teamwork You are required to do this assignment and all other programming assignments using pair programming. When you submit your assignment, indicate the composition of your team (names, student ids, and email addresses) in your README file.

If you cannot find a partner and would like one, send an email to comp311@rice.edu and we will try to find one for you. Teams of more than two members are not permitted.

Task Your assignment is to change the representation of the data structures in the Jam run-time environment so that all such storage is explicitly managed by your interpreter rather than run-time system of the underlying Java Virtual Machine. In other words, you will implement your own "new" operations for Jam run-time objects instead of relying on corresponding operations in Java. All of your Jam run-time objects will be stored in a large array of integers, mimicking machine memory. The assignment is most easily done in four phases: (i) conversion of Jam programs to "complete CPS" form, (ii) explicit representation of Jam code, (iii) explicit allocation of all Jam objects without garbage collection, and (iv) support for storage reuse using garbage collection. The first two phases are short exercises.

The "complete CPS" transformation converts all calls on program defined procedures to tail-calls so that they can be implemented as jumps in a compiler. It also means that all live storage (except for temporaries created to hold the intermediate results of chains of primitive operations) is accessible from the current activation record even though we are not explicitly storing a dynamic link; the continuation closure includes the dynamic link as its environment field.

A compiler stores the intermediate results of chains of primitive operations in machine registers and if necessary, a spill area in current activation record. We will simulate this approach by defining a global stack of registers (temporaries). A real compiler must do more work because it must determine which temporary quantities are stored in registers and which are stored in space it reserves for temporaries in the activation record.

To manage our global stack of temporaries, we will push all intermediate results on this stack while they are active and pop them off when they are no longer needed. Every pushed temporary must eventually popped to avoid a storage leak. It is easy to write this code such that push and pop operations are always paired in the same procedure of your interpreter. The stack plus a pointer to the current activation record is your root set for performing a garbage collection.

To help you aggressively test the various phases of your program, we are providing the test program called sieve in the tests subdirectory. We also recommend converting some test programs from previous assignments to the syntax required for this assignment (recall that let is not recursive and letrec only accepts maps as right-hand-sides. To execute the sieve program, you will have to make one change to your interpreter, namely eliminating the run-time check to confirm that the second argument to cons is a list. In this program, the second argument is often a thunk (map of no arguments).

Phase I As a first phase of this assignment, you will modify your CPS converter from Assignment 6 to perform "complete CPS conversion" (excepting primitive operations) and clean up a few rules from the previous assignment that are unnecessarily verbose. In Assignment 6, the CPS converter eliminated all nested procedure calls, but we still permitted nested let constructions. If we were compiling Jam to machine code, the CPS form from Assignment 6 provides the minimum restructuring required to produce tail calling code where all procedure calls are implemented as jumps (procedures never return!). Nested let and letrec constructions can be "flattened" into a single rec-let provided that all variable names bound in the collection of let and letrec constructions are uniquely named (so the flattening does not capture bound variables).

let x₁ := let x₂ :=E₂;
           in B₂;
 in B₁

must return the value of the nested let expression to the context of the outer let expression.

Fortunately, it is very easy to the modify the CPS process from Assignment 6 to eliminate such nested let/letrec constructions. All that is required is a small change to the definition of "simple expression". In particular, no let or letrec construction is simple. Hence, the only change that you need to make to your CPS converter from Assignment 6 is to change the clauses for let and letrec in the definition of the "is-a-simple-expression" predicate.

For our Jam interpreters, however, we need a more thorough form of CPS conversion to eliminate nested subcomputations. In a Jam interpreter, the evaluation of a let or letrec construction introduces a new activation record (environment frame).

Phase IA Modify the rules for CPS conversion of let expressions as follows:

If M is let x₁ :=S₁; ... ; x_m := S_m; x_m+1 :=E_m+1; ... ; x_n :=E_n; in B, where 0 < m <= n:

Cps[k, let x₁ :=S₁; ... ; x_m := S_m; x_m+1 :=E_m+1; ... ; x_n :=E_n; in B] => let x₁ :=Rsh[S₁]; ... ; x_m := Rsh[S_m]; in Cps[k, let x_m+1 :=E_m+1; ... ; x_n :=E_n; in B]
If M is let x1 :=E1; ... ; xn :=En; in B, where n > 0:

Cps[k, let x1 :=E1; ... ; xn :=En; in B] => Cps[map x1 to Cps[k, let x2 :=E2; ... ; xn :=En; in B], E1]

In the both of these transformations, the list of bindings in a generated let construction can be empty. In the case, you simply contract the let construction to its body.

This modification slightly simplifies the form of CPS output which tends to be very wordy in the absence of simplifying transformations (such (map x to x)(E) => E). In this assignment, we choose not to perform these simplifications for the sake of brevity.

Phase II In the machine-level representation of Jam data introduced in Phase III, you must represent all the fields of a closure object as ordinary integers. For the environment field, the proper choice of integer representation is obvious since an environment is simply a pointer (integer index) to an activation record allocated in your heap array. But the code portion of the closure is not stored anywhere in your heap array. To identify the code bodies of closures using integers, we must build an auxiliary table containing each map expression that appears in the program text. When you convert program text to static distance representation, you can enter each map that you encounter is an array of ASTs and use the integer index to refer to the map. You can include a field to store this index in the static distance coordinate version of the map abstract syntax "class".

Modify the representation of closures in your static distance interpreter from Assignment 6 so that the code portion is represented by an integer. (When you build a closure for a map M, store the integer index for M in the code field instead M itself.)

Phase III In this phase, you must reimplement every "new" operation performed by your static distance Jam interpreter as the allocation of a block of integer words from your heap array. We suggest a representation scheme in which every Jam data value and run-time data object is represented by an integer reference to a block of memory in your heap array.

There are five different forms of “new” in your Jam interpreter:

the construction of cons nodes
the construction of ref nodes
the construction of closures
the construction of activation records (the records in your static distance representation of environments);
the construction of Jam int constants
the construction of Jam bool constants

Each of these forms must be represented by a reference to a block of integer words in your heap array. Each of these five forms of Jam data must be distinguishable at run-time. A simple but effective way to distinguish the various forms or records is to use the lead word as a header (tag).

The null constant and unit constant do not appear in the list of allocated forms of data above because there is only a single value of each these forms. Such constants can be represented most conveniently by using a "reserved" reference "address"; zero or a small negative number like -1 is a good choice. Of course no object can be stored at this reserved address in your heap array. Alternatively, you can use a conventional tagged record representation but only allocate one copy of each value. Then the address of this single copy can be used as the unique representation for the value. (The addresses of any such unique copies must be included in the root set!)

In this phase you will modify your interpreter to this new representation for Jam run-time data. Your interpreter will simply abort any computation that runs out of heap memory.

To reduce the amount stack space used in the evaluation of expressions involving constructor operations like cons, you can preallocate the constructed record (with null fields) before evaluating the argument expressions and store the result for each field in the preallocated record as it computed.

In your Java code base, add a second argument int heapSize to each visible constructor for your Interpreter class. This argument specifies the number of cells ("words") in your memory array. You must also add a method int[] getMemory() to the Interpreter class that returns the memory array to simulate a Jam heap.

The autograding program will use the memory returning operations provided as part of the top level interface to confirm that you are using memory to represent Jam values, but it will not presume any particular data representation.

Note that your top-level evaluation methods (functions) must ”decode” your low level Jam data representations into high level Jam data values with exactly the same representation in Java as in previous assignments.

If you run out of memory, you should throw an EvalException, EvalError or an error with a message containing the substring 'eval, depending on what language you use, just like in the previous assignments.

Phase IV Write a Cheney copying collector to reclaim unused storage in your heap array. The only root for collection is the current activation record (the value of the env pointer) and any values that have been computed as arguments for the next activation record (the value of the pending args stack).

Since garbage collection is only safe for code that you have converted to CPS, delete the Interpreter methods that support the static distance interpretation of Jam without performing a CPS transformation.

Testing and Submitting Your Program

The directory tests contains the Java test file Assign7Test.java containing some very simple unit tests for your program.

Your program directory should only contain the files that you want to submit. Make sure that your program passes the sample unit tests and works with our grading script before submitting it. Create a README file in the your directory program/7 that

contains nothing but the user names of the team members in lines 1 and 2 and a blank line as line 3,
outlines the organization of your program, and
specifies what testing process you used to confirm the correctness of your program.

Make sure that your README file is structured correctly:

It has to be named README (all upper case, no .txt or anything)
It has to be in the assignment's "root directory", e.g. ~/comp311/programs/7
The first two lines should only contain your usernames and nothing else. The third line should be blank (if you're working alone, leave lines 2 and 3 blank), e.g.:

mgricken
dmp

This is where the readme text starts.

Any test data files for your program must be stored in the programs/7 directory. Do not store anything in subdirectories below programs/7.

Each procedure or method in your program should include a short comment stating precisely what it does. For routines that parse a particular form of program phrase, give grammar rules describing that form.

Please make sure to remove or disable all debug output before submitting. Excessive use of print statements considerably slows your interpreter and thus testing your project down. If your interpreter generates a timeout error because debug output slows it down, you may not get credit for the test cases that took too long.

To submit your program, make sure that everything that you want to submit is located in the directory comp311/programs/7 on CLE@R and type the command

/coursedata/courses/comp311/bin/turnin311 7

from within the comp311/programs directory.

The command will inform you whether or not your submission succeeded. Only submit one copy of your program per team. If you need to resubmit an improved version your program, submit it from the same account as the original so that the old version is replaced.

Back to course website

Assigned: Friday, November 19, 2010 Due: 11:59 PM, Friday, December 3, 2010

Files

Overview

Testing and Submitting Your Program

Assigned: Friday, November 19, 2010
Due: 11:59 PM, Friday, December 3, 2010