Adding Standard Operator Precedence to the Jack Compiler

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Adding Standard Operator Precedence to the Jack Compiler

cadet1620
Administrator
This post was updated on .
The original grammar for Jack expressions is

expression::=  term (op term)*
term::= int-const | string-const | keyword-const | var-name | (var-name '[' expression ']') | subroutine-call | ( '(' expression ')' ) | (unary-op term)
op::=  ( '+' | '-' | '*' | '/' | '&' | '|' | '<' | '>' | '=' )
unary-op::=  ( '-' | '~' )

My compiler uses recursive descent as suggested in the book. Its operator precedence is left to right associative for binary operators, with higher precedence right to left associative unary operators. The unary operator behavior comes from the production

term::=  unary-op term

New Grammar for Expression

The operator precedence that I want to implement in Jack is based on the C and Java precedence.

PrecedenceOperatorsAssociation
Highest− ~right to left(unary operators)
* /left to right
+ −left to right
< > =left to right
&left to right
Lowest|left to right

Assigning operator precedence doesn't change the syntax, only the semantics, but assigning each level of equal precedence operators to their own syntax element makes implementing this operator precedence easy using recursive descent.

The expression must be defined to consist of nested sub-expression types. Each sub-expression type consists of higher precedence sub-expressions combined with this type's operator(s).

expression::= or-expr
or-expr::= and-expr (or-op and-expr)*
and-expr::= cmp-expr (and-op cmp-expr)*
cmp-expr::= add-expr (cmp-op add-expr)*
add-expr::= mult-expr (add-op mult-expr)*
mult-expr::= term (mult-op term)*
term::= int-const | string-const | keyword-const | var-name | (var-name '[' expression ']') | subroutine-call | ( '(' expression ')' ) | (unary-op term)
or-op::=  '|'
and-op::=  '&'
cmp-op::=  ( '<' | '>' | '=' )
add-op::=  ( '+' | '-' )
mult-op::=  ( '*' | '/' )
unary-op::=  ( '-' | '~' )

Term and unary-op are unchanged from their original definitions.
Expression and or-expr are separated for clarity. Implementation will probably collapse them into a single function.

How it Works

The key to understanding how this works is to realize that in the stack-based model all expressions can be written with the pushes in the original left-to-right order as they appear in the expression; the only thing that evaluation order changes is the position of the operators. Consider the expression a+b*c=d+e and its stack-based code:

Original Precedence
(((a+b)*c)=d)+e
push a
push b
+
push c
*
push d
=
push e
+
New Precedence
(a+(b*c))=(d+e)
push a
push b
push c
*
+
pushd
push e
+
=

This means that if the recursive descent parse/compilation routines write the pushes when they parse terminal elements, and write the operators after parsing the operator's arguments, the correct stack-based code will be written. This is effectively the same as the original version compiling a parenthesized expression; the original recursed to compileExpression when it encountered a '(' in compileTerm.

Original Compiler
a+(b*c)
compileExpression
 | compileTerm
 |  |_ matches var-namepush a
 | matches '+'
 | compileTerm
 |  | matches '('
 |  | compileExpression
 |  |  | compileTerm
 |  |  |  |_ matches var-namepush b
 |  |  | matches '*'
 |  |  | compileTerm
 |  |  |  |_ matches var-namepush c
 |  |  |_ compileOp('*')*
 |  |_ matches ')'
 |_ compileOp('+')+

New Compiler
a+b*c
compileExpression
 | compileAndExpr
 |  | compileCmpExpr
 |  |  | compileAddExpr
 |  |  |  | compileMultExpr
 |  |  |  |  | compileTerm
 |  |  |  |  |  |_ matches var-namepush a
 |  |  |  |  |_
 |  |  |  | matches '+'
 |  |  |  | compileMultExpr
 |  |  |  |  | compileTerm
 |  |  |  |  |  |_ matches var-namepush b
 |  |  |  |  | matches '*'
 |  |  |  |  | compileTerm
 |  |  |  |  |  |_ matches var-namepush c
 |  |  |  |  |_ compileOp('*')*
 |  |  |  |_ compileOp('+')+
 |  |  |_
 |  |_
 |_

Implementation

Notice that all of the xxx-expr elements have the same structure; only the elements they reference changes. Once compileOrExpr has been written, the remaining compileXxxExpr functions can be quickly written using cut-and-paste.

If you are conversant with function pointers, you can write a parameterized compileExpr function that all the compileXxxExpr functions can use. In my compiler (written in Python) they look like this.

def _compileOrExpr(self):
    self._compileExpr('or-expr', self._compileAndExpr, self._isOrOperator)

def _compileAndExpr(self):
    self._compileExpr('and-expr', self._compileCmpExpr, self._isAndOperator)

def _compileCmpExpr(self):
    self._compileExpr('cmp-expr', self._compileAddExpr, self._isCmpOperator)

def _compileAddExpr(self):
    self._compileExpr('add-expr', self._compileMultExpr, self._isAddOperator)

def _compileMultExpr(self):        
    self._compileExpr('mult-expr', self._compileTerm, self._isMultOperator)

def _compileExpr(self, expr, compileSubExpr, isOperator):
    ## Compiles <expr> := <subExpr> (<operator> <subExpr>)*
	
    compileSubExpr()                        # first term on stack

    while (self.tokenizer.tokenType() == TK_SYMBOL and \
            isOperator(self.tokenizer.symbol())):
        operator = self.tokenizer.symbol()
        self._nextToken()
        
        compileSubExpr()                    # next term on stack
        self._compileOperator(operator)     # result on stack

The 'expr' parameter is a string for informational purposes only. It is only used when the "generate XML" option is enabled. I stripped out the XML related code that was interleaved with the compilation code so that compileExpr would be easier to read.

I decided to keep compileOrExpr separate so that compileExpression could check the "arithmetic precedence" option and either call compileOrExpr or continue with its old code for left-to-right precedence.

[As usual, this documentation was about 10 times as much work as the code changes themselves!]

--Mark