Adding Nested Variable Scopes to the Jack Compiler

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Adding Nested Variable Scopes to the Jack Compiler

cadet1620
Administrator
The ability to declare variables within if and while blocks, especially temporary index variables like i and j can be very useful with nested loops.
/** Print a triangle of numbers.
 */
function void Triangle() {
    var int i;
    while (i < 10) {
        var int j;
        while (~(j > i)) {
            do Output.printInt(j);
            do Output.printChar(32);
            let j = j+1;
        }
        do Output.println();
        let i = i-1;
    return;
}

Syntax changes

The book's syntax for if and while statements is:

ifStatement: 'if' '(' expression ')' '{' statements '}' ( 'else' '{' statements '}' )?
whileStatement: 'while' '(' expression ')' '{' statements '}'
statements:  statement*

The new syntax will be:

ifStatement: 'if' '(' expression ')' statement ( 'else' statement )?
whileStatement: 'while' '(' expression ')' statement
statement:  ( statementBlock | letStatement | ifStatement | whileStatement | doStatement | returnStatement )*
statementBlock: '{' varDec* statements '}'
statements:  statement*

Notes:

  1. The modified statement construct allows a nested scope to be used wherever a statement is allowed.
  2. Using statement in the ifStatement and whileStatement constructs instead of the more obvious statementBlock allows single command if, else and while bodies without enclosing braces.
  3. The statementBlock construct is syntactically identical to the subroutineBody construct, but their semantics are sufficiently different that two separate compilation routines are warranted.

Implementation

The classical implementation of nested variables is to allocate them on the stack when entering the statement block which contains them, and deallocating them when exiting the block. This is usually done by directly manipulating the Stack Pointer. This way the minimum required amount of stack is in use at any time. In VM code, this would look something like:
function foo 3  // 3 local variables: i, j, k = local 0, 1, 2
...
push const 0    // Enter subscope with 2 additional variables: x, y
push const 0    // Assuming the working stack was empty, x, y = local 3, 4
...
pop temp 0      // Exit local scope
pop temp 0

There is a problem using this technique in the Jack compiler. The VM Emulator will raise an "out of segment space" error if the code attempts to use local 3 or 4 because it knows that the local segment is only 3 words long.

The generated VM code must allocate all the local variables, regardless of where they are allocated in the function, in the function VM command.

Mapping nested variables to the local segment

An obvious way to map the nested variables onto the local segment is to assign them the next index when they are encountered during compilation. This is wasteful usage of stack space because the variables in a subscope are never used after the end of the subscope; their stack location can be used for variables in subsequent scopes. In the following example there are 7 variables, so 7 words of local segment would be required.

There is a better scheme. Increment the local index when variables are defined in a subscope and decrement the index appropriately when the subscope ends. Keep track of the maximum index used so that the correctly sized local segment can be allocated.

1:  class Example {
2:      field int x, y;
3:      function void scopes(int n) {
4:          var int a, b;           // local 0, 1
5:          if (a = n) {
6:              var int b;          // local 2
7:              while (b > 0) {
8:                  var int a, c;   // local 3, 4
9:                  ...
10:             }
11:         }
12:         while (b = 0) {
13:             var int c, d;       // local 2, 3
14:             ...
15:         }
16:         return;
17:     }
18: }
There are some interesting code generation challenges here:
  1. The required length of the local segment is not known until the end of the function, yet it must be written in the function VM command at the beginning of the function.
  2. The Jack language specifies that variables are automatically initialized to 0. This zeroing is done by the function VM command. Since subscope variables reuse local segment indices, this implies that they must be set to 0 using push and pop VM commands. Note also that c and d on line 13 are set to 0 on every iteration of the while loop.
These issues will be discussed later.

SymbolTable changes

The SymbolTable module presented in the book handles 2 scopes: class and subroutine. The book suggests using 2 hash tables, one for class symbols and one for subroutine symbols.

The new SymbolTable will need to handle an unknown number of nested scopes. There are two fairly obvious designs. One is to have SymbolTable handle all the details of managing scope, the other is for SymbolTable to handle a single scope and pass it a "parent" SymbolTable that it should search when a symbol lookup fails in its scope. The parent scope will in turn search its parent if a lookup fails, and so on.

In my compiler it was least disruptive to take a hybrid approach where SymbolTable.newScope() created and linked the new SymbolTable.

void compileScope() {
    // Enter new subscope
    SymbolTable parentTable = symbolTable;
    symbolTable = symbolTable.newScope();

    // Compile code in the scope
    ...
    // Exit the subscope
    symbolTable = parentTable;
}
symbolTable.newScope() copies the kindIndex array from the current SymbolTable into the subscope SymbolTable. This way no changes are required to symbolTable.varCount(), and the subscope variables will get the correct indices. Note that only kind=VAR symbols will be added to subscopes.

compileSubroutine() needs to enter the subroutine scope similarly instead of calling startSubroutine(), and it must exit the subroutine scope before returning.

After line 8 in the above example, there are 4 SymbolTables in the list. CompilationEngine.symbolTable points to Example.scopes.1.1.

name                parent                kindCount     symbols
                                       fld sta arg var  
Example.scopes.1.1  Example.scopes.1    2   0   1   5   a, c
Example.scopes.1    Example.scopes      2   0   1   3   b
Example.scopes      Example             2   0   1   2   n, a, b
Example             NULL                2   0   0   0   x, y
Storing the "name" field is optional. My compiler uses it for error messages and warnings.

VmWriter changes

The size of the local segment is required in the function VM command at the beginning of the function's VM code, but that value is not known until the end of the function.

The entire function must be compiled before the function command is written. This requires that the VmWriter must be able to write to a memory buffer while compiling a function, then write the function command to the output file, and finally write the buffered function code to the output file.

In my compiler, the most direct way to do this was to pass None (Python's version of null) in place of OutputFile to signal the VmWriter to output to a string list.

CompilationEngine changes

void compileVarDec(boolean initialize)

Variables that are defined in the subroutine scope do not need to be explicitly zeroed because the function VM command zeros the local segment. Variables that are defined in subscopes must be explicitly zeroed because they use memory in the local segment that may have been used by other subscope variables.

This can be handled by passing an initialize parameter to compileVarDec().

void compileSubroutine() {
    // Enter subroutine scope
    ...
    // Swap to buffered VmWriter
    localSize = 0;
    VmWriter bufferWriter = new VmWriter(NULL);
    VmWriter fileWriter = vmWriter;
    vmWriter = bufferWriter;
    compileVarDec(false);   // do not write zeroing code
    ...
    compileStatements();
    // Swap back to output file VmWriter
    vmWriter = fileWriter;
    vmWriter.writeFunction(funName, localSize);
    vmWriter.appendBuffer(bufferWriter.getBuffer());
    // Exit subroutine scope
}
localSize keeps track of the maximum index value allocated by compileVarDec(). At the end of compileScope(), before exiting the subscope SymbolTable, localSize is updated with the current local size if it is larger than localSize.