next up previous
Next: Input Language Specification:

C311 - Program 1: Syntax, Parsing, Abstract Syntax

Corky Cartwright

Produced: August 29, 2002-- Due: 11:59 P.M., Sunday, September 8, 2001

Overview


Preparation: We suggest that you run the OwlNet register command for comp311, which will make the latest versions of drscheme and drjava available as commands in your OwlNet environment.

For more information about DrScheme, read the documentation in

http://www.cs.rice.edu/CS/PLT/packages/drscheme/.
DrJava is distributed as a Java .jar file at the web site drjava.info. You can run DrJava on a Unix system (with Java JDK 1.3 or 1.4 installed) by typing
java -jar drjava.jar
assuming that drjava.jar is located in the current directory. On Windows machines (with Java JDK 1.3 or JDK 1.4 installed), you can run the drjava.jar file simply by clicking on the icon.

Documentation for DrJava is available at drjava.info. In addition, if you have questions about DrJava, send email to

javaplt@rice.edu.

Documentation on the JSR14 extension of Java is available at

http://www.cis.unisa.edu.au/~pizza/gj/..
We strongly recommend following the documentation link and reading the GJ Tutorial. The download site for JSR14 is http://developer.java.sun.com/. Follow the link listed under the heading <B>Early Access Downloads</B>.

A good article on the use of Generic Java in conjuction with unit testing is available the the htmladdnormallinkIBM DeveloperWorks website. http://www-106.ibm.com/developerworks/java/library/j-diag0625.html?dwzone=java.

Every student should create a comp311 directory in his or her home directory. Each programming team should submit their solution under the userid one team member. We recommend creating a a programs/1 directory within the directory comp311. All of your files for this assignment should be stored in the programs/1 subdirectory.


TeamWork: You are strongly encouraged to do this assignment and all other assignments in this course in teams of two. When you submit your assignment, indicate the composition of your team (names, student id's, and email addresses) and to what extent you followed the pair-programming model in your README file.

If you cannot find a partner and would like one, send James Sasitorn (camus@rice.edu) email and he will try to find one for you. Teams of more than two members are not permitted.


TeamWork: You are expected to write unit tests for all of your non-trivial methods or functions, depending on whether you write your assignments in Java or Scheme. DrJava provides integrated support for JUnit which makes this task easy. In Scheme, you can write unit tests in the style taught in Comp 210 with one addition. You will need to write a function <TT>(assert-equal x y m)</TT> that tests whether <TT>(equal? x y)</TT> and calls <TT>(error m)</TT> if the test returns <TT>false</TT>. Unit tests are simply calls on this function included in your program definitions file.

Note that this unit testing requirement conflicts in some cases with the name hiding recommendations given for writing Scheme code in Comp 210 because lexically nested function definitions cannot be tested. In you Scheme programs all non-trivial functions must be accessible at the top-level.

If you are adventurous, you may try using a new Scheme unit testing tool available as on open source tool from sourceforge. It is not clear if the tool will run with DrScheme version 103 (which is the version of the DrScheme that we support). You may have to run DrScheme version 200 and convert the class libraries to use the new DrScheme module system.


Task: Your task is to write a parser in Generic Java or Scheme that recognizes expressions in the illustrative language Jam, which will be defined subsequently. The course staff has provided lexers in both Scheme and Java for this assignment. These lexers return a larger set of symbols than what appears in the Jam definition. These extra symbols anticipate future extensions to Jam. At this point, your parser should simply reject any program containing one of these extra symbols as ill-formed, just like it would for any other syntax error.


Scheme Version:

In Scheme, the parser's input language is a stream of tokens implemented as a parameterless ``read'' procedure p that encodes a token stream. Given the procedure p, the application (p) returns the next token in the encoded stream, which is either a Scheme symbol, a Scheme integer, or a special end-of-file object that is recognized by the predicate eof-object?.

The Scheme library file rawlexer.ss implements a scanner (lexical analyzer) that converts either a file or a string to a corresponding token stream. Given a string s specifying a file name, the application (make-file-lexer s) returns the corresponding token stream. Given a string s consisting of program text, the application (make-string-lexer s) returns the corresponding token stream.

The parser's output language is Jam abstract syntax, which is defined in the library file abs.ss. If the parser determines that some given input does not represent a legal Jam program, it should print an appropriate error message and terminate the parsing process. In Java, you can abort the computation by throwing a ParseException for which there is no catch handler.

Note that rawlexer.ss and abs.ss are Teachpack (library) files and should be loaded into DrScheme using the AddTeachpack command on the Language menu. You can load both Teachpack files rawlexer.ss and abs.ss into DrScheme as a single TeachPack library by designating the file lexer.ss as the TeachPack to be loaded. lexer.ss defines a composite module containing both rawlexer.ss and abs.ss. These library files are available on the web in the directory www.cs.rice.edu/~cork/311/Assignments/1/scheme.

Your parser should be expressed as a function parse that takes a string specifying a file name as its only argument and returns an abstract syntax tree for the Jam expression in the specified file. Your Scheme program should be stored in the file parser.ss, which will be recognized by the turnin311 command and our grading program.


Java (GJ) Version:

In Java, the parser's input token stream is provided by a Lexer object supporting the parameterless method readToken(). The readToken() method produces a stream of Token objects belonging to various subclasses of the Token interface. The Lexer class supports two constructors: a zero-ary constructor that converts standard input to a Lexer and a unary constructor Lexer(String fileName) that converts the specified file to a Lexer. The file lexer.java contains the definition of the Lexer class and the collection of classes representing tokens (the Java interface Token) and abstract syntax trees (the java interface AST).

Your parser should be expressed as a class Parser with constructors Parser(Reader stream) and Parser(String filename) which create parsers reading the specified stream or file. The provided Lexer class includes two similar constructors Lexer(Reader stream) and Lexer(String filename). The Reader form of these constructors is very convenient when you are testing your parser from the DrJava read-eval-print loop or from unit test methods.

The Parser class should contain a public method AST parse() that returns the abstract syntax tree for the Jam expression in the input stream. Our grading program will print the output using the method System.out.println, which invokes the toString method on its argument. All of the requisite abstract syntax classes including toString methods are defined in the file lexer.java. This library files are available on the web in the directory www.cs.rice.edu/~cork/311/Assignments/1/java.

If your parser encounters an erroneous input, simply print an explanation of the syntax error and abort parsing the file (using the error primitive Scheme and the throwing of a ParseException in Java. Recovering from a syntax error to continue parsing is a complex problem with many special cases. It is not a realistic goal for this assignment.


Choice of Language: Generic Java and Scheme are the only languages permitted for this assignment.




next up previous
Next: Input Language Specification:
Corky Cartwright 2002-08-29