next up previous
Next: Testing and Submitting Your Up: C311 - Program 1: Syntax, Previous: C311 - Program 1: Syntax,

Input Language Specification:

The following paragraphs define the parser's input language and the subset that your parser is supposed to recognize a legal Jam program.


The Input Language The parser's input language is <input> where:

%% A phrase P enclosed in braces { P } is optional
%% A phrase P followed by a * is iterated zero or more times, e.g. <def>+ means
%%   the concatenation of zero or more <def> phrases
%% A phrase P followed by a + is iterated one or more times, e.g. <def>+ means
%%   the concatenation of one or more <def> phrases
%% If a phrase P enclosed in brackes { P } is followed by a * or +, the braces are
%%   simply denote grouping: the enclosed phrase is iterated as specified by the
%%   * or + symbol

  <input> ::= <token>*

  <token> ::= <alpha/other> <alpha/other/numeric>* | <delimiter> | <operator>

%% Adjacent tokens must be separated by whitespace (a non-empty sequence of 
%% spaces, tabs, and newlines) unless one of the tokens is a delimiter or 
%% operator.
%% In idenitifying operators, the lexer chooses the longest possible match.  
%% Hence, "<=" is %% interpreted as a single token rather than "<" followed by "="

  <alpha/other/numeric> ::=  <alpha/other> <digit> 

  <alpha/other> ::= <lower> | <upper> | <other> 

  <lower> ::= a | b | c | d | ... | z

  <upper> ::= A | B | C | D | ... | Z

  <other>  ::= ? | _ 

  <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

  <delimiter> ::= ( | ) | [ | ] | , | ; | "{" | "}"

  <operator> ::= "+" | - | ~ | "*" | / | = | != | < | > | <= | >= | & | 
                 "|" | ! | <- | := | ref

Jam The set of legitimate Jam phrases is the subset of the potential inputs consisting the expressions <exp> defined by the following grammar:

%% Expressions:
  <exp> ::= <term> { <binop> <exp> }
	  | if <exp> then <exp> else <exp>
	  | let <def>+ in <exp>
	  | map <id-list> to <exp>

  <term> ::= <unop> <term>
	   | <factor> { ( <exp-list> ) }
	   | <null> 
	   | <int>
	   | <bool>

  <factor> ::= ( <exp> ) |
	     | <prim>
	     | <id>

  <exp-list>      ::= { <prop-exp-list> }
  <prop-exp-list> ::= <exp>   | <exp> , <prop-exp-list> 
  <id-list>       ::= { <prop-id-list> }
  <prop-id-list>  ::= <id>    | <id> , <prop-id-list> 

%% Definitions:
  <def> ::= <id> := <exp> ;

%% Primitive Constants, Operators, and Operations: 
  <null>  ::= null
  <bool>  ::= true | false 

  <unop>  ::= <sign> | ~ 
  <sign>  ::= "+" | -
  <binop> ::= <sign> | "*" | / | = | != | < | > | <= | >= | & | "|"
  <prim>  ::= number? | function? | ref? | list? | null? | cons? | cons
            | first | rest | arity

%% Identifiers:
  <id>  ::= <alpha/other> {<alpha/other> | <digit>}*
            %% except for <prim> and the keywords if, then, else,
            %%   map, to, let, in, null, true, false and the operator ref

  <alpha/other> ::= <alpha> | <other>  %% <other> as defined above
  <alpha> ::= <upper> | <lower>        %% <upper> and <lower> as defined above


%% Numbers:
   <int>  ::= <digit>+


The lexer recognizes keywords, delimiters (parentheses, commas, semicolons), primitive operations, identifiers, and numbers. The set of recognized operators includes two unary operators, namely ref and !, that are unused in our current dialect of Jam.

The preceding grammar requires one symbol lookahead in a few situations. The Scheme lexer in the file lexer.ss supports a peek operation precisely for this purpose. Given a token stream p, an application of the form (p $x$) where $x$ is any argument (e.g., the symbol 'peek) returns a copy of the next token in p without removing it from (or otherwise changing) p.

In the Java lexer, the method Token peek() behaves exactly like the method Token readToken() except for the fact that it leaves the the scanned token at the front of the input stream (in contrast readToken() removes the scanned token from the input stream).

Abstract Syntax Trees As mentioned above, the Scheme library lexer.ss defines the set of constructors used to build abstract syntax trees in Scheme. There is one constructor for each different form of expression in the definition of Jam syntax given above. In addition, there is one abstract syntax form for Jam definitions, which can easily express both concrete forms given above.

For example, suppose the Jam program under consideration is

                   f(x) + (x * 12))

The abstract syntax representation for this program phrase is:

(make-biop-exp '+
  (make-app-exp (make-id-exp 'f) (list (make-id-exp 'x)))
  (make-biop-exp '* (make-id-exp 'x) (make-num-exp 12)))
in the context of the following data descriptor definitions:
(define-structure (biop-exp rator rand1 rand2))
(define-structure (app-exp rator l-of-rands))
(define-structure (id-exp arg))
(define-structure (num-exp arg))

The Java file lexer.java defines all of the abstract syntax classes required to represent Jam programs. There is one constructor for each different form of expression in the definition of Jam syntax given above. Some primitive (non-recursive) abstract syntax classes are the same the corresponding token classes. There is one abstract syntax form for Jam definitions, which can easily express both concrete forms given above.


next up previous
Next: Testing and Submitting Your Up: C311 - Program 1: Syntax, Previous: C311 - Program 1: Syntax,
Corky Cartwright 2002-08-29