Notes:
- The modified statement construct allows a nested scope to be used
wherever a statement is allowed.
- Using statement in the ifStatement and whileStatement
constructs instead of the more obvious statementBlock allows single command
if, else and while bodies without enclosing braces.
- The statementBlock construct is syntactically identical to the subroutineBody
construct, but their semantics are sufficiently different that two separate
compilation routines are warranted.
Implementation
The classical implementation of nested variables is to allocate them on the stack
when entering the statement block which contains them, and deallocating them when
exiting the block. This is usually done by directly manipulating the Stack
Pointer. This way the minimum required amount of stack is in use at any time.
In VM code, this would look something like:
function foo 3 // 3 local variables: i, j, k = local 0, 1, 2
...
push const 0 // Enter subscope with 2 additional variables: x, y
push const 0 // Assuming the working stack was empty, x, y = local 3, 4
...
pop temp 0 // Exit local scope
pop temp 0
There is a problem using this technique in the Jack compiler. The VM Emulator
will raise an "out of segment space" error if the code attempts to use local 3
or 4 because it knows that the local segment is only 3 words long.
The generated VM code must allocate all the local variables, regardless of where
they are allocated in the function, in the function VM command.
Mapping nested variables to the local segment
An obvious way to map the nested variables onto the local segment is to assign them
the next index when they are encountered during compilation. This is wasteful
usage of stack space because the variables in a subscope are never used after the
end of the subscope; their stack location can be used for variables in subsequent
scopes. In the following example there are 7 variables, so 7 words of
local segment would be required.
There is a better scheme. Increment the local index when variables are defined
in a subscope and decrement the index appropriately when the subscope ends.
Keep track of the maximum index used so that the correctly sized local segment
can be allocated.
1: class Example {
2: field int x, y;
3: function void scopes(int n) {
4: var int a, b; // local 0, 1
5: if (a = n) {
6: var int b; // local 2
7: while (b > 0) {
8: var int a, c; // local 3, 4
9: ...
10: }
11: }
12: while (b = 0) {
13: var int c, d; // local 2, 3
14: ...
15: }
16: return;
17: }
18: }
There are some interesting code generation challenges here:
- The required length of the local segment is not known until the end of the
function, yet it must be written in the function VM command at the beginning
of the function.
- The Jack language specifies that variables are automatically initialized
to 0. This zeroing is done by the function VM command. Since subscope variables
reuse local segment indices, this implies that they must be set to 0 using push
and pop VM commands. Note also that c and d on line 13 are set
to 0 on every iteration of the while loop.
These issues will be discussed later.
SymbolTable changes
The SymbolTable module presented in the book handles 2 scopes: class and subroutine.
The book suggests using 2 hash tables, one for class symbols and one for subroutine
symbols.
The new SymbolTable will need to handle an unknown number of nested scopes. There
are two fairly obvious designs. One is to have SymbolTable handle all the details
of managing scope, the other is for SymbolTable to handle a single scope and pass
it a "parent" SymbolTable that it should search when a symbol lookup fails in its
scope. The parent scope will in turn search its parent if a lookup fails, and so on.
In my compiler it was least disruptive to take a hybrid approach where
SymbolTable.newScope() created and linked the new SymbolTable.
void compileScope() {
// Enter new subscope
SymbolTable parentTable = symbolTable;
symbolTable = symbolTable.newScope();
// Compile code in the scope
...
// Exit the subscope
symbolTable = parentTable;
}
symbolTable.newScope() copies the kindIndex array from the current
SymbolTable into the subscope SymbolTable. This way no changes are required to
symbolTable.varCount(), and the subscope variables will get the correct
indices. Note that only kind=VAR symbols will be added to subscopes.
compileSubroutine() needs to enter the subroutine scope similarly instead
of calling startSubroutine(), and it must exit the subroutine scope
before returning.
After line 8 in the above example, there are 4 SymbolTables in the list.
CompilationEngine.symbolTable points to Example.scopes.1.1.
name parent kindCount symbols
fld sta arg var
Example.scopes.1.1 Example.scopes.1 2 0 1 5 a, c
Example.scopes.1 Example.scopes 2 0 1 3 b
Example.scopes Example 2 0 1 2 n, a, b
Example NULL 2 0 0 0 x, y
Storing the "name" field is optional. My compiler uses it for error messages
and warnings.
VmWriter changes
The size of the local segment is required in the function VM
command at the beginning of the function's VM code, but that value is not known
until the end of the function.
The entire function must be compiled before the function command is written.
This requires that the VmWriter must be able to write to a memory buffer while
compiling a function, then write the function command to the output file, and
finally write the buffered function code to the output file.
In my compiler, the most direct way to do this was to pass None (Python's version
of null) in place of OutputFile to signal the VmWriter to output to a string list.
CompilationEngine changes
void compileVarDec(boolean initialize)
Variables that are defined in the subroutine scope do not need to be explicitly
zeroed because the function VM command zeros the local segment.
Variables that are defined in subscopes must be explicitly zeroed because they
use memory in the local segment that may have been used by other subscope
variables.
This can be handled by passing an initialize parameter to
compileVarDec().
void compileSubroutine() {
// Enter subroutine scope
...
// Swap to buffered VmWriter
localSize = 0;
VmWriter bufferWriter = new VmWriter(NULL);
VmWriter fileWriter = vmWriter;
vmWriter = bufferWriter;
compileVarDec(false); // do not write zeroing code
...
compileStatements();
// Swap back to output file VmWriter
vmWriter = fileWriter;
vmWriter.writeFunction(funName, localSize);
vmWriter.appendBuffer(bufferWriter.getBuffer());
// Exit subroutine scope
}
localSize keeps track of the maximum index value allocated by
compileVarDec(). At the end of compileScope(), before
exiting the subscope SymbolTable, localSize is updated with the
current local size if it is larger than localSize.