Assignment 7 (extra credit):
No supporting code. Use your own interpreter from Assignment 6.
Preparation Create a programs/7 directory within your comp311 directory. All of your files for this assignment should be stored in the programs/7 subdirectory. Do not put files in subdirectories below programs/7.
Teamwork You are required to do this assignment and all other programming assignments using pair programming. When you submit your assignment, indicate the composition of your team (names, student ids, and email addresses) in your README file.
If you cannot find a partner and would like one, send an email to email@example.com and we will try to find one for you. Teams of more than two members are not permitted.
Task Your assignment is to change the representation of the data structures in the Jam run-time environment so that all such storage is explicitly managed by your interpreter rather than run-time system of the underlying language (Java, OCaml or Scheme). In other words you will implement your own "new" operations for Jam run-time objects instead of relying on corresponding operations in Java, OCaml or Scheme. All of your Jam run-time objects will be stored in a large array of integers, mimicking machine memory. The assignment is most easily done in four phases: (i) conversion of Jam programs to "complete CPS" form, (ii) explicit representation of Jam code, (iii) explicit allocation of all Jam objects without garbage collection, and (iv) support for storage reuse using garbage collection. The first two phases are short exercises.
The "complete CPS" transformation converts all calls on program defined procedures to tail-calls so that they can be implemented as jumps in a compiler. It also means that all live storage (except for temporaries created to hold the intermediate results of chains of primitive operations) is accessible from the current activation record even though we are not explicitly storing a dynamic link; the continuation closure includes the dynamic link as its environment field.
A compiler stores the intermediate results of chains of primitive operations in machine registers and if necessary, a spill area in current activation record. We will simulate this approach by defining a global stack of registers (temporaries). A real compiler must do more work because it must determine which temporary quantities are stored in registers and which are stored in space it reserves for temporaries in the activation record.
To manage our global stack of temporaries, we will push all intermediate results on this stack while they are active and pop them off when they are no longer needed. Every pushed temporary must eventually popped to avoid a storage leak. It is not difficult to write this code such that push and pop operations are always paired in the same procedure of your interpreter. The stack plus a pointer to the current activation record is your root set for performing a garbage collection.
To help you aggressively test the various phases of your program, we are providing the test program called sieve in the tests subdirectory. We also recommend converting some test programs from previous assignments to the syntax required for this assignment (recall that let is not recursive and letrec only accepts maps as right-hand-sides. To execute the sieve program, you will have to make one change to your interpreter, namely eliminating the run-time check to confirm that the second argument to cons is a list. In this program, the second argument is often a thunk (map of no arguments).
Phase I As a first phase of this assignment, you will modify your CPS converter from Assignment 6 to perform "complete CPS conversion" (excepting primitive operations) and clean up a few rules from the previous assignment that are unnecessarily verbose. In Assignment 6, the CPS converter eliminated all nested procedure calls, but we still permitted nested let constructions. If we were compiling Jam to machine code, the CPS form from Assignment 6 provides the minimum restructuring required to produce tail calling code where all procedure calls are implemented as jumps (procedures never return!). Nested let and letrec constructions can be "flattened" into a single rec-let provided that all variable names bound in the collection of let and letrec constructions are uniquely named (so the flattening does not capture bound variables).
let x1 := let x2 :=E2; in B2; in B1
must return the value of the nested let expression to the context of the outer let expression.
Fortunately, it is very easy to the modify the CPS process from Assignment 6 to eliminate such nested let/letrec constructions. All that is required is a small change to the definition of "simple expression". In particular, no let or letrec construction is simple. Hence, the only change that you need to make to your CPS converter from Assignment 6 is to change the clauses for let and letrec in the definition of the "is-a-simple-expression" predicate.
For our Jam interpreters, however, we need a more thorough form of CPS conversion to eliminate nested subcomputations. In a Jam interpreter, the evaluation of a let or letrec construction introduces a new activation record (environment frame). Hence, the evaluation of a let construction of the form
Phase IA Modify the rules for CPS conversion of let expressions as follows:
In the both of these transformations, the list of bindings in a generated let construction can be empty. In the case, you simply contract the let construction to its body.
This modification slightly simplifies the form of CPS output which tends to be very wordy in the absence of simplifying transformations (such (map x to x)(E) => E). In this assignment, we choose not to perform these simplifications for the sake of brevity.
Phase II In the machine-level representation of Jam data introduced in Phase III, you must represent all the fields of a closure object as ordinary integers. For the environment field, the proper choice of integer representation is obvious since an environment is simply a pointer (integer index) to an activation record allocated in your heap array. But the code portion of the closure is not stored anywhere in your heap array. To identify the code bodies of closures using integers, we must build an auxiliary table containing each map expression that appears in the program text. When you convert program text to static distance representation, you can enter each map that you encounter is an array of ASTs and use the integer index to refer to the map. You can include a field to store this index in the static distance coordinate version of the map abstract syntax "class".
Modify the representation of closures in your static distance interpreter from Assignment 6 so that the code portion is represented by an integer. (When you build a closure for a map M, store the integer index for M in the code field instead M itself.)
Phase III In this phase, you must reimplement every "new" operation performed by your static distance Jam interpreter as the allocation of a block of integer words from your heap array. We suggest a representation scheme in which every Jam data value and run-time data object is represented by an integer reference to a block of memory in your heap array.
There are five different forms of “new” in your Jam interpreter:
Each of these forms must be represented by a reference to a block of integer words in your heap array. Each of these five forms of Jam data must be distinguishable at run-time. A simple but effective way to distinguish the various forms or records is to use the lead word as a header (tag).
The null constant and unit constant do not appear in the list of allocated forms of data above because there is only a single value of each these forms. Such constants can be represented most conveniently by using a "reserved" reference "address"; zero or a small negative number like -1 is a good choice. Of course no object can be stored at this reserved address in your heap array. Alternatively, you can use a conventional tagged record representation but only allocate one copy of each value. Then the address of this single copy can be used as the unique representation for the value. (The addresses of any such unique copies must be included in the root set!)
In this phase you will modify your interpreter to this new representation for Jam run-time data. Your interpreter will simply abort any computation that runs out of heap memory.
To reduce the amount stack space used in the evaluation of expressions involving constructor operations like cons, you can preallocate the constructed record (with null fields) before evaluating the argument expressions and store the result for each field in the preallocated record as it computed.
Java If you are writing the assignment in Java, add a second argument int heapSize to each visible constructors to your Interpreter class. This argument specifies the number of cells ("words") in your memory array. You must also add a method int getMemory() to the Interpreter class that returns the memory array to simulate a Jam heap.
Scheme If you are writing the assignment in Scheme, add a second argument of integer type to the functions make-sd-cps-eval-file and make-sd-cps-eval-string. This argument specifies the number of cells (”words”) in your memory array. In addition, you must change the return type of these two functions. They must return functions of one argument (a symbol) instead of thunks. The returned function will perform the specified evaluation if it is passed the symbol ’value as an argument. The returned function will return the memory array used to simulate the Jam heap if it is passed the symbol ’memory.
OCaml If you are writing the assignment in OCaml, add a second argument of integer type to the functions sd_cps_eval_file and sd_cps_eval_string. This argument specifies the number of cells (”words”) in your memory array. In addition, you must change the return type of these two functions. They must return functions of one argument (a "return mode") instead of thunks. The returned function will perform the specified evaluation if it is passed the mode RETVALUE as an argument. The returned function will return the memory array used to simulate the Jam heap if it is passed the mode RETMEMORY.
type retmode = RETVALUE | RETMEMORY;; type retvalue = RValue of value | RMemory of int array ;; sd_cps_eval_file: string -> int -> retmode -> retvalue sd_cps_eval_string: string -> int -> retmode -> retvalue
The autograding program will use the memory returning operations provided as part of the top level interface to confirm that you are using memory to represent Jam values.
Note that your top-level evaluation methods (functions) must ”decode” your low level Jam data representations into high level Jam data values with exactly the same representation (in Java, Scheme or OCaml) as in previous assignments.
If you run out of memory, you should throw an EvalException, EvalError or an error with a message containing the substring 'eval, depending on what language you use, just like in the previous assignments.
Phase IV Write a Cheney copying collector to reclaim unused storage in your heap array. The only root for collection is the current activation record (the value of the env pointer) and any values that have been computed as arguments for the next activation record (the value of the pending args stack).
Since garbage collection is only safe for code that you have converted to CPS, delete the Interpreter methods (in Java) or top-level functions (in Scheme or OCaml) that support the static distance interpretation of Jam without performing a CPS transformation.
The directory tests will contain Java, O'Caml and Scheme test files Assign7Test.java, assign7test.ml, and assign7test.ss containing some very simple unit tests for your program.
Your program directory should only contain the files that you want to submit. Make sure that your program passes the sample unit tests and works with our grading script before submitting it. You must provide a README file in the directory programs/7 with the following format:
firstname.lastname@example.org email@example.com(in either order) and nothing else.
Any test data files for your program must be stored in the programs/7 directory. Do not store anything in subdirectories below programs/7.
Each procedure or method in your program should include a short comment stating precisely what it does. For routines that parse a particular form of program phrase, give grammar rules describing that form.
To submit your program, make sure that everything that you want to submit is located in the directory programs/7 and type the command
turnin311 -t[scheme|java|ocaml] 7
from within the comp311/programs directory. For example, if you have an O'Caml version you would execute
turnin311 -tocaml 7
The command will inform you whether or not your submission succeeded. Only submit one copy of your program per team. If you need to resubmit an improved version your program, submit it from the same account as the original so that the old version is replaced.
Note if turnin311 is not on your path, you will need to add /home/comp311/bin to your path: