This zip has the VM files and the new File built in class that you need to run the Jack Jack Compiler.
JackCompiler-1-0.zip
From the README.TXT
The compiler accesses the host file system using a new VM Emulator built-in
class, "File.class". File.class must be copied into the VME's built-in
directory, "nand2tetris/tools/builtInVMCode".
This will allow the VME to access files in ".../builtInVMCode/VmeFileSystem".
See this forum post for more about File.class:
File class for VMEmulator
The compiler can generate both .vm and .xml files, so you can test it against
any of the Chapter 10 or 11 files.
Alas, it will never run in the CPU Emulator, even if I figure out how to add
built-in I/O devices so that it could have a virtual hard disk. My optimizing
VM translator generated 68K of code for the compiler's VM files, without
including the OS .vm files (83K with the OS)!
The compiler is rather slow. On my somewhat long-in-the-tooth laptop it runs
at about 33 lines per second generating just .vm files. If you turn on the
.xml option, it drops to about 13 lines per second.
There's an amazing amount of memory allocation/deallocation and function
calling going on.
Symbolic Constants
One of the first challenges writing this code was how to deal with symbolic constants. For instance,
the tokenizer defines values for return values like TK_KEYWORD and KW_WHILE and the VM writer defines symbols like SEG_LOCAL and OP_NEG. I didn't want to scatter integer constants like 1, 16, 2 and 5 throughout the code; that could be a debugging nightmare!
So how can one Jack class get to constants defined in another class? For that matter, how can a Jack class define constants in the first place (no fair extending the language).
The only way one class can get to another class's data is by calling one of its subroutines. There need
to be functions that return the constants. For example, from VmWriter.jack:
function int SEG_TEMP() { return 5; }
function int SEG_THAT() { return 6; }
function int SEG_THIS() { return 7; } // MUST be last
(SEG_THIS must be last because it is used to define an array size.)
These constants can then be use like this example from compileSubroutine():
if (subroutineType = JackTokenizer.KW_METHOD()) {
// 'this' is hidden argument 0
let str = "~this~";
do symbolTable.define (str, className, SymbolTable.ARG());
do str.dispose();
}
String Management
Consider the movement of an identifier name from the tokenizer to the compiler to the symbol table
when a new variable is defined.
The tokenizer allocates a new String to hold the identifier when it is parsed. The Tokenizer.identifier()
method may be called multiple times for the same identifier, so the returned String must be considered
owned by the tokenizer. The compiler gets this String from the tokenizer, passes it to the
Symbol table which then creates a new String copied from the tokenizer's String. The copies string
is then added to the hash table.
The next time that the tokenizer parses an identifier, the existing identifier String will be
destroyed, and a new one will be allocated for the new identifier.
I have a StrUtil class that is a collection of useful functions like duplicate() that does the
allocation and copy used in the above example.
Another interesting issue is string concatenation. For instance a subroutine name is formed as
full_name = class_name + '.' + subroutine_name.
More StrUtil functions to the rescue:
// Target name = objectName.subroutineName
let objectName = StrUtil.appendCharGrow(objectName, 46); // '.'
let objectName = StrUtil.appendGrow(objectName, subroutineName);
do vmWriter.writeCall(objectName, numArgs);
The reason for assigning the return value back to
objectName is that if
objectName is
too small to hold the result, the appendGrow() functions will allocate a new String that is large
enough to hold the result and destroy the original
objectName String. The new String has a
different address than the original
objectName, so the local variable must be updated.
(This sort of thing goes on automatically behind the scenes when you use strings in a language like
Java or Python. Your string variables are pointers to pointers so that library routines can change
the indirect pointers when they need to reallocate the string.)
There are lots of printed error messages in the code like "Expected something, got
whatever" that need to print lots of string constants. OutUtil has printStringConst() that
automatically destroys its argument. Here's another example from the compiler:
/**
* Expect the current token to be a symbol. Print error message
* and abort compile if this is not so.
*
* Returns true if a matching symbol is found.
* Returns false to abort compile.
*/
method boolean _expectSymbol(char symbol) {
if ((token.tokenType() = JackTokenizer.TK_SYMBOL()) & (token.symbol() = symbol)) {
return true; }
do _printErrorLine();
do OutUtil.printStringConst("Expected ");
do _printSymbol(symbol, false);
do OutUtil.printStringConst(", got ");
do _printToken();
do Output.println();
return false;
}
[Enough for now. Next installment will be dealing with error returns without memory leaks.]
--Mark