Programming Assignment 7:
Machine Level Representation and Management of Jam Data Structures

 

Points: 150

Files

Start with your interpreter from Assignment 6.

Tests

The support files provided in the resources for Project 7 at Piazza are identical to those for Assignment 6 except for (i) a minor revision of the definitions of Evaluator, SymEvaluator and (commented out) SDEvaluator that eliminates the overriding of concrete methods (and preserves the semantics of the Assignment 6 code) and (ii) the addition of more documentation for the code supporting the static distance representation of Jam programs (which was unused in Assignment 6). You are encouraged but not required to use the cleaned up version of the support code.

Overview

Preparation   Create a programs/7 directory within your comp411 directory. All of your files for this assignment should be stored in the programs/7 subdirectory. Do not put files in subdirectories below programs/7.

Teamwork   You are required to do this assignment and all other programming assignments using pair programming. When you submit your assignment, indicate the composition of your team (names, student ids, and email addresses) in your README file.

If you cannot find a partner and would like one, post a message to our Piazza page and we will try to find one for you. Teams of more than two members are not permitted.

Details

Like Assignment 6, this is a long, challenging assignment and will be worth 150 points instead of 100 points.

Your assignment is to create a Jam interpreter that converts Jam programs to use a static distance representation for variables and represents all Jam values (JamVals) and activation records as indices in (pointers into) a simulated memory array of integers. All Jam run-time values including environments must be explicitly represented using the simulated memory array managed by your interpreter rather than run-time system of the underlying Java Virtual Machine. In other words, you will implement your own "new" operations for Jam run-time objects instead of relying on corresponding operations in Java. All of your Jam run-time objects--including JamVals and activation records--will be stored in a large array of integers, mimicking machine memory. In addition, all intermediate JamVals (often called "temporaries") will be stored in a "register" array of Java integers. The latter constraint is essential for writing a garbage collector that moves objects, which is an extra credit addendum to this assignment (described below). If only you store intermediate JamVals in variables in your interpreter (other than the "register" array), the garbage collector cannot find them when it moves objects.

The assignment is most easily done in four phases: (i) writing a translator to convert Jam programs expressed as conventional symbolic abstract syntax trees (type SymAST in Assignment 6) to abstract syntax trees that use static distance coordinates for variables rather than explicit symbols; (ii) converting your Jam interpreter from Assignment 6 to interpret programs expressed using static distance coordinates which requires changing the representation of Jam environments; (iii) representing Jam code in a separate code array (akin to code memory in a JVM except that our Jam code will be represented as SDASTs instead of Java byte code); and (iv) the conversion of your interpreter to perform explicit allocation of all Jam objects (including activation records) in a separate memory array and the storage of all intermediate JamVals in an array of registers.

The first, second, and third phases of the assignment are short exercises. Phase two would be a significant programming task without the support code provided for Assignment 6 which already includes much of the data design and some of the code required for this phase, namely

Phase four involves some grubby details (representing Jam data objects using an explicit memory array) but it is easy to code.

A compiler stores the intermediate results of chains of primitive operations in machine registers and if necessary, a spill area in current activation record. We will simulate this approach by defining a global stack of registers (temporaries) that hold references to the Jam heap (or pseudo-references like a special code for Jam 'null'). A real compiler must do more work because it must determine which temporary quantities are stored in registers and which are stored in space it reserves for temporaries in the activation record.

To manage our global stack of temporaries, we will push all intermediate results on this stack while they are active and pop them off when they are no longer needed. Every pushed temporary must eventually popped to avoid a storage leak. It is easy to write this code such that push and pop operations are always paired in the same procedure of your interpreter. The stack plus a pointer to the current activation record is your root set for performing a garbage collection.

Note that we do not need to allocate any space in activation records for temporaries since we have a dedicated global stack of registers for this purpose. We recommend representing this stack of registers as an ArrayList. The element type depends on the interpreter. For the initial high-level static distance coordinate interpreter, the elements have type JamVal. For the final low-level static distance interpreter that uses an explicit heap memory array, the elements have type int corresponding to indices in the heap (or negative pseudo-indices used to represent the constants null, unit true, and false.

To help you aggressively test the various phases of your program, we are providing the test program called sieve in the tests subdirectory. We also recommend converting some test programs from previous assignments to the syntax required for this assignment (recall that let is not recursive and letrec only accepts maps as right-hand-sides. To execute the sieve program, you will have to make one change to your interpreter, namely eliminating the run-time check to confirm that the second argument to cons is a list. In this program, the second argument is often a thunk (map of no arguments).

Phase Specifications

Phase I.   As the first phase of this assignment, you must write a static distance converter that maps an input program represented as a conventional abstract syntax tree (SymAST) to a corresponding static distance abstract syntax tree (SDAST). In the support code the types SymAST and SDAST share many node types. Only map, let, letrec, and variables are represented differently. In addition, letcc is legal in a SymAST but illegal in a SDAST. Of course, all of the subtrees of type AST (conceptually the union of the SymAST and SDAST types) embedded in a SymAst (SDAST) must also be of type SymAST (SDAST). Since we cannot statically enforce this restriction, we must use casts in the code from AST to SymAST (SDAST) where appropriate.

Phase II.   In the second phase, you must write a Jam interpreter for SDASTs. You should share as much code with your existing SymAST interpreter as possible. The conventional Jam interpreter and static distance interpreter share implementations for the methods in the interface ComASTVisitor. In practice, a ComASTVisitor is not useful because it cannot be applied to either a SymAST or a SDAST. Rather than defining an abstract class for interpreting the nodes common to SymASTs and SDASTs (as identified in ComASTVisitor), the support code defines a corresponding abstract class for interpreting the nodes in the union of SymAST and SDAST (designated as type AST in the support code) where the methods for processing nodes not in ComASTVisitor are left abstract except for the code to process a Letcc node which happens to be identical in both conventional AST interpretation and static distance AST interpretation. Since the support code already includes (commented out) top level code for interpreting SDASTs, this phase is easy. You simply need to write the code for the auxiliary methods that support the SDAST interpreter, namely a lookup method for a receiver of type SDEnv and a static distance coordinate (a Pair) argument.

Phase III.   In the third phase, you must write a new Jam interpreter for SDASTs that is a modest revision of the interpreter from Phase II. The only changes are that:

To support the idea that each map in the input program corresponds to a compiled subroutine, you must add a private int field named codeIndex to the SMap class and bind this field in each SMap instance that you create when performing conversion to static distance form. The first map encountered in converting the program will have index 0 (since Java arrays are indexed starting at 0) and each subsequent map will be inserted in the next open slot in the code array table. This table will be implemented simply as a Java array of type SDAST[]. You can build the contents of the array as a Java ArrayList so it can be easily constructed as a Java List and subsequently accessed efficiently using randomly access. This code array will help support the explicit representation of closures including the code body for the closure (identified by a codeIndex) in a memory array in Phase IV.

The code array loosely mimics the compiled code for the program. Since we are interpreting rather than compiling Jam programs, we will still represent program expressions as SDASTs, but we will also support referring to map expressions in the program using integer indices. Each map expression will be represented by an index into a code array (starting with index 0 since Java array indices start at 0) of type SDAST. Of course, the type of a map expression in the SDAST is still SMap.

Modify the representation of closures in your static distance interpreter from Assignment 6 so that the code portion is represented by an integer index. (When you build a closure for a map M, store the integer index for M in the code field instead M itself.)

Note: we could retain the exact representation of SDASTs from Phase II (instead of slightly modifying the definition of SMap) nodes by defining a subclass of SMap (named IndexedSMap) that adds the codeIndex field. Then the revised interpreter would require an SDAST with all maps represented by IndexedSMap nodes. In this assignment, we are ultimately interested in SDASTs that are indexed so we simply revise the definition of SMap. Since the added indices are merely information annotating SMap nodes, the static distance interpreter from Phase II will still work, oblivious to the additional information.

You can test your revised static distance coordinate interpreter by comparing its behavior with the interpreter from Phase II.

Since Java does not perform tail-call optimization, the interpreters in this Assignment can create very deep stacks. To run one of our tests (involving the a stream-based prime sieve function), we had to use a stack size of 512M, which is enormous. Tail-call optimization matters!

Phase IV.   In the fourth and final phase, you must modify the representation of JamVals and activation records so that all Jam objects are represented by blocks in an explicit heap memory array. In short, you must reimplement every "new" operation for a JamVal or activation record performed by your static distance Jam interpreter as the allocation of a block of integer cells from your heap array. We mandate the following representation of JamVals and activation records in your memory array so that we can easily test your program. With the exception of the special constants null (the empty list), unit true, and false every JamVal will be represented by the index of a block of two or more cells where the first cell holds a tag identifying the form of data represented. The constants null, unit, true, and false are represented by the pseudo-indices (int values) -1, -2, -3, and -4, respectively.

Excluding null, unit, true, and false, there are five different forms of “new” JamVal (and activation record) operations in your Jam interpreter:

The tag values for int constants, cons nodes, ref nodes, closures, and activation records are 1, 2, 3, 4, and 5, respectively. Note that we have excluded the value 0 from the set of legal tag values to help you debug your code. When you create your heap array, we recommend that you initialize it to zero. If your interpreter ever encounters 0 as a tag value, you know that you have encountered an illegal JamVal or activation record (probably caused by a faulty pointer [index into your explicit memory array]).

Each of the constructed node types (excluding null, unit, true, false) has one of more fields following it:

Note that JamVals (other than the special constants null, unit, true, false) are represented by the indices of (pointers to) the corresponding representations in the memory array. The special constants null, unit, true, false) are represented by the corresponding pseudo-indices which are negative integer tags.

The representations of the special constants are disjoint from the representations of all other JamVals, which are non-negative indices in (pointers into) the heap memory array.

Your interpreter should simply abort any computation that overflows the heap by throwing an EvalException with an appropriate error message.

For Assignment 7, use the same constructor interface for the Interpreter class as in Assignment 6. Simply define a public static final int constant (say HEAPSIZE) as the default size for your heap. I suggest the value (int)Math.power(2,20), or equivalently 1<<20. If you do the extra credit part of the assignment, you will have to add constructors that include heapsize as a second parameter as described below. We need this Interpreter interface to test your garbage collector.

Our test code will assume that your Interpreter class supports the SDEval method described below in addition to the conventional eval method that you supported in Assignment 6. We also recommend (but do not require) that you support the static distance interpreter developed in Phase II in your submitted program under the name origSDEval. It should behave exactly like SDEval.

You must also add a method int[] getMemory() to the Interpreter class that returns your memory array simulating a Jam heap.

The autograding program will use the memory returning operations provided as part of the top level interface to confirm that you are using memory to represent Jam values and check that you are representing Jam values correctly.

Note that your top-level evaluation methods must "decode" your low level Jam data representations into high level Jam data values with exactly the same representation in Java as in previous assignments so that the visible behavior of your interpreter (modulo a sufficiently large heap) is unaffected by using an explicit heap memory array.

The interface that you must support is the following:

    /** Default heap size is ~1 million words */
    public static final int HEAPSIZE = 1<<20;

    public Interpreter(String fileName);
    public Interpreter(Parser p);
    public Interpreter(Reader reader);

    /** Evaluate the program in SymAST form as in Assignment 6. */
    public JamVal eval(); 

    /** Evaluate the program in SDAST form using the explicit heap and return the result decoded as JamVal. */
    public JamVal SDEval(); 

    /** Return the heap. */
    public int[] getMemory(); 

    /** Return the SymAST form for the program. */
    public AST getSymAST(); 

    /** Return the SD form for the program. */
    public AST getSDAST(); 
    

In addition we recommend that you support the method

    /** Evaluate the program in SD form and return the result. */
    public JamVal origSDEval(); 
    

because it enormously simplifies testing your SDEval method. The input-output behavior of SDEval and origSDEval should be identical. In grading your program, we will have to use the getMemory method to examine your heap to verify that you are using an explict heap in implementing SDEval.

In your Interpreter, you should maintain fields for the SymAST and SDAST representations of the input program. The getters for these fields should construct the corresponding representations on demand if they have not already been constructed. Constructing the SymAST implies parsing the input stream and performing the requisite static checks on the contructed SymAST. During the syntax checking process, you may also construct the SDAST representation (since both tasks require a traversal of the SymAST representation. If your syntax checker does not construct the SDAST (the class solution does not), then perform SDAST construction (conversion to static distance form) when getSDAST is called (which, of course, can force parsing and syntax-checking to build the SymAST representation if it does not already exist).

Extra Credit

You can earn 100 points extra credit by implementing a Cheney copying collector for your heap to reclaim unused storage in your heap array. You should allocate a new memory heap array (of the specified size) for each collection instead of using alternating semi-spaces.

The only root for collection is the address of the current activation record in the heap (the value of the env pointer) and the contents of your stack of temporary registers. Since all temporary values are stored in the temporary register stack, we do not need to apply the CPS transformation to Jam programs in order to support garbage collection. Our solution stores the current activation on top of the register stack so the register stack is the entire root set. You can store any JamVal representation, either the index of an object in your memory heap or a negative pseudo-index (representing a JamVals null, true, false, or unit).

In your Interpreter   class, create a two argument constructor corresponding to each constructor in code provided for Assignment 6. The second argument is an int specifying the number of cells in your memory array. We need this interface to test your garbage collector on smaller heap sizes than 220 cells.

Cheney collection is easy to code but may be hard to debug. We don't want you to spend long hours debugging your Cheney collector if you have trouble getting it to work. Remember this is an extra credit part of the assignment.

Testing   You are expected to write unit tests for all of your non-trivial methods or functions and submit your test code as part of your program. For more information, refer back to Assignment 1.