The original grammar for Jack expressions is
expression | ::= |
term (op term)* |
term | ::= | int-const |
string-const | keyword-const | var-name | (var-name '[' expression ']') | subroutine-call | ( '(' expression ')' ) |
(unary-op term) |
op | ::= |
( '+' |
'-' |
'*' |
'/' |
'&' |
'|' |
'<' |
'>' |
'=' ) |
unary-op | ::= |
( '-' |
'~' ) |
My compiler uses recursive descent as suggested in the book. Its operator precedence is
left to right associative for binary operators, with higher precedence right to left associative
unary operators. The unary operator behavior comes from the production
New Grammar for Expression
The operator precedence that I want to implement in Jack is based on the C and Java precedence.
Precedence | Operators | Association |
Highest | − ~ | right to left | (unary operators) |
| * / | left to right |
| + − | left to right |
| < > = | left to right |
| & | left to right |
Lowest | | | left to right |
Assigning operator precedence doesn't change the syntax, only the semantics, but assigning
each level of equal precedence operators to their own syntax element makes implementing this
operator precedence easy using recursive descent.
The expression must be defined to consist of nested sub-expression types. Each sub-expression type
consists of higher precedence sub-expressions combined with this type's operator(s).
expression | ::= | or-expr |
or-expr | ::= | and-expr (or-op and-expr)* |
and-expr | ::= | cmp-expr (and-op cmp-expr)* |
cmp-expr | ::= | add-expr (cmp-op add-expr)* |
add-expr | ::= | mult-expr (add-op mult-expr)* |
mult-expr | ::= | term (mult-op term)* |
term | ::= | int-const |
string-const | keyword-const | var-name | (var-name '[' expression ']') | subroutine-call | ( '(' expression ')' ) |
(unary-op term) |
or-op | ::= |
'|' |
and-op | ::= |
'&' |
cmp-op | ::= |
( '<' |
'>' |
'=' ) |
add-op | ::= |
( '+' |
'-' ) |
mult-op | ::= |
( '*' |
'/' ) |
unary-op | ::= |
( '-' |
'~' ) |
|
Term and unary-op are unchanged from their original definitions. |
Expression and or-expr are separated for clarity. Implementation
will probably collapse them into a single function. |
How it Works
The key to understanding how this works is to realize that in the stack-based model all expressions
can be written with the pushes in the original left-to-right order as they appear in the expression;
the only thing that evaluation order changes is the position of the operators. Consider the
expression
a+b*c=d+e and its stack-based code:
Original Precedence (((a+b)*c)=d)+e |
push a |
push b |
+ | |
push c |
* |
push d |
= |
push e |
+ |
| |
New Precedence (a+(b*c))=(d+e) |
push a |
push b |
push c |
* |
+ |
pushd |
push e |
+ |
= |
|
This means that if the recursive descent parse/compilation routines write the pushes when they parse
terminal elements, and write the operators after parsing the operator's arguments, the correct
stack-based code will be written. This is effectively the same as the original version compiling
a parenthesized expression; the original recursed to compileExpression when it encountered
a '(' in compileTerm.
Original Compiler a+(b*c) |
---|
compileExpression |
| compileTerm | |
| |_ matches var-name | | push a |
| matches '+' |
| compileTerm |
| | matches '(' |
| | compileExpression |
| | | compileTerm |
| | | |_ matches var-name | | push b |
| | | matches '*' |
| | | compileTerm |
| | | |_ matches var-name | | push c |
| | |_ compileOp('*') | | * |
| |_ matches ')' |
|_ compileOp('+') | | + |
New Compiler a+b*c |
---|
compileExpression |
| compileAndExpr | |
| | compileCmpExpr |
| | | compileAddExpr |
| | | | compileMultExpr |
| | | | | compileTerm |
| | | | | |_ matches var-name | | push a |
| | | | |_ |
| | | | matches '+' |
| | | | compileMultExpr |
| | | | | compileTerm |
| | | | | |_ matches var-name | | push b |
| | | | | matches '*' |
| | | | | compileTerm |
| | | | | |_ matches var-name | | push c |
| | | | |_ compileOp('*') | | * |
| | | |_ compileOp('+') | | + |
| | |_ |
| |_ |
|_ |
Implementation
Notice that all of the
xxx-expr elements have the same structure; only the elements they
reference changes. Once
compileOrExpr has been written, the remaining
compileXxxExpr functions can be quickly written using cut-and-paste.
If you are conversant with function pointers, you can write a parameterized compileExpr
function that all the compileXxxExpr functions can use. In my compiler (written in
Python) they look like this.
def _compileOrExpr(self):
self._compileExpr('or-expr', self._compileAndExpr, self._isOrOperator)
def _compileAndExpr(self):
self._compileExpr('and-expr', self._compileCmpExpr, self._isAndOperator)
def _compileCmpExpr(self):
self._compileExpr('cmp-expr', self._compileAddExpr, self._isCmpOperator)
def _compileAddExpr(self):
self._compileExpr('add-expr', self._compileMultExpr, self._isAddOperator)
def _compileMultExpr(self):
self._compileExpr('mult-expr', self._compileTerm, self._isMultOperator)
def _compileExpr(self, expr, compileSubExpr, isOperator):
## Compiles <expr> := <subExpr> (<operator> <subExpr>)*
compileSubExpr() # first term on stack
while (self.tokenizer.tokenType() == TK_SYMBOL and \
isOperator(self.tokenizer.symbol())):
operator = self.tokenizer.symbol()
self._nextToken()
compileSubExpr() # next term on stack
self._compileOperator(operator) # result on stack
The 'expr' parameter is a string for informational purposes only. It is only used when the
"generate XML" option is enabled. I stripped out the XML related code that was interleaved
with the compilation code so that compileExpr would be easier to read.
I decided to keep compileOrExpr separate so that compileExpression could check
the "arithmetic precedence" option and either call compileOrExpr or continue with its old
code for left-to-right precedence.
[As usual, this documentation was about 10 times as much work as the code changes themselves!]
--Mark