The Input Language The
parser's input language is <input> where:
%% A phrase P enclosed in braces { P } is optional %% A phrase P followed by a * is iterated zero or more times, e.g. <def>+ means %% the concatenation of zero or more <def> phrases %% A phrase P followed by a + is iterated one or more times, e.g. <def>+ means %% the concatenation of one or more <def> phrases %% If a phrase P enclosed in brackes { P } is followed by a * or +, the braces are %% simply denote grouping: the enclosed phrase is iterated as specified by the %% * or + symbol <input> ::= <token>* <token> ::= <alpha/other> <alpha/other/numeric>* | <delimiter> | <operator> %% Adjacent tokens must be separated by whitespace (a non-empty sequence of %% spaces, tabs, and newlines) unless one of the tokens is a delimiter or %% operator. %% In idenitifying operators, the lexer chooses the longest possible match. %% Hence, "<=" is %% interpreted as a single token rather than "<" followed by "=" <alpha/other/numeric> ::= <alpha/other> <digit> <alpha/other> ::= <lower> | <upper> | <other> <lower> ::= a | b | c | d | ... | z <upper> ::= A | B | C | D | ... | Z <other> ::= ? | _ <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <delimiter> ::= ( | ) | [ | ] | , | ; | "{" | "}" <operator> ::= "+" | - | ~ | "*" | / | = | != | < | > | <= | >= | & | "|" | ! | <- | := | ref
Jam The set of legitimate Jam phrases is the subset of the potential inputs consisting the expressions <exp> defined by the following grammar:
%% Expressions: <exp> ::= <term> { <binop> <exp> } | if <exp> then <exp> else <exp> | let <def>+ in <exp> | map <id-list> to <exp> <term> ::= <unop> <term> | <factor> { ( <exp-list> ) } | <null> | <int> | <bool> <factor> ::= ( <exp> ) | | <prim> | <id> <exp-list> ::= { <prop-exp-list> } <prop-exp-list> ::= <exp> | <exp> , <prop-exp-list> <id-list> ::= { <prop-id-list> } <prop-id-list> ::= <id> | <id> , <prop-id-list> %% Definitions: <def> ::= <id> := <exp> ; %% Primitive Constants, Operators, and Operations: <null> ::= null <bool> ::= true | false <unop> ::= <sign> | ~ <sign> ::= "+" | - <binop> ::= <sign> | "*" | / | = | != | < | > | <= | >= | & | "|" <prim> ::= number? | function? | ref? | list? | null? | cons? | cons | first | rest | arity %% Identifiers: <id> ::= <alpha/other> {<alpha/other> | <digit>}* %% except for <prim> and the keywords if, then, else, %% map, to, let, in, null, true, false and the operator ref <alpha/other> ::= <alpha> | <other> %% <other> as defined above <alpha> ::= <upper> | <lower> %% <upper> and <lower> as defined above %% Numbers: <int> ::= <digit>+
The lexer recognizes keywords, delimiters (parentheses, commas, semicolons),
primitive operations, identifiers, and numbers. The set of recognized
operators includes two unary operators,
namely ref
and !
, that are unused in our current dialect of
Jam.
The preceding grammar requires one symbol lookahead in a few
situations.
The Scheme lexer in the file lexer.ss supports a peek
operation precisely for this purpose. Given a token stream p,
an application of the
form (p ) where
is any argument (e.g., the symbol
'peek) returns a copy of the next token in p without removing
it from (or otherwise changing) p.
In the Java lexer, the method Token peek() behaves exactly like the method Token readToken() except for the fact that it leaves the the scanned token at the front of the input stream (in contrast readToken() removes the scanned token from the input stream).
Abstract Syntax Trees As mentioned above, the Scheme library lexer.ss defines the set of constructors used to build abstract syntax trees in Scheme. There is one constructor for each different form of expression in the definition of Jam syntax given above. In addition, there is one abstract syntax form for Jam definitions, which can easily express both concrete forms given above.
For example, suppose the Jam program under consideration is
f(x) + (x * 12))
The abstract syntax representation for this program phrase is:
(make-biop-exp '+ (make-app-exp (make-id-exp 'f) (list (make-id-exp 'x))) (make-biop-exp '* (make-id-exp 'x) (make-num-exp 12)))in the context of the following data descriptor definitions:
(define-structure (biop-exp rator rand1 rand2)) (define-structure (app-exp rator l-of-rands)) (define-structure (id-exp arg)) (define-structure (num-exp arg))
The Java file lexer.java defines all of the abstract syntax classes required to represent Jam programs. There is one constructor for each different form of expression in the definition of Jam syntax given above. Some primitive (non-recursive) abstract syntax classes are the same the corresponding token classes. There is one abstract syntax form for Jam definitions, which can easily express both concrete forms given above.