The Hack assembler written in Nand2Tetris is an
absolute assembler.
The entire source code must be in one file and the .hack file that it produces is a ROM image that must be loaded starting at address 0. RAM symbols are allocated starting at address 16.
This sort of assembler works quite well for its intended purposes in this course: learning the Hack Assembly Language by writing small programs, learning the design of a simple assembler and writing one, and assembling large machine generated assembly language files as generated by the VM translator.
This does not work well it you are writing a large assembly language program that will consist of many source files, possibly written by different team members. This would require that all the ASM files be combined into a single file and run through the assembler.
Imagine the chaos—how many of those files have the label (LOOP) in them?
Overview
The solution is to assemble all the ASM files separately and use a
link editor (linker) that can combine multiple binary files into the final program binary.
For this to work, the ASM code needs to list what symbols are to be available for other ASM files to use, and what symbols it needs to use from other ASM files. This is called linkage information. Typical linkage info looks like this:
.public (TETRA)
.extern (LSUM)
.extern LSUM_n
.extern LSUM_p
Another issue is that the binary files have memory addresses for both code labels and variables. These addresses will be incorrect if the binary is loaded into memory anywhere other than starting at address 0. Correcting these addresses when the binary is moved to a new starting address is called relocation.
The linkage and relocation information needs to be written into the binary files so that the linker will be able to use it. Binary files that include this information are relocatable object files.
Relocatable object files
Every relocatable object file is generated with its ROM addresses starting at address 0, and their RAM variables will be allocated starting at address 0. (This is different than the normal Hack assembler which allocates RAM variables starting at address 16; the linker will ensure that RAM variables in the linked binary will start at address 16.)
The built in symbols R0–R15, SP, LCL, ARG, THIS, THAT, SCREEN and KBD are absolute (constant) RAM addresses, not relocatable variables.
Hack relocatable object files will be similar format to the existing .hack files, writing the data values in binary. The linkage and relocation information will use the following suffixes on the binary value.
Linkage Information |
value PC symbol |
Public label, value is the label's ROM address. |
value PD symbol |
Public variable, value is the variable's RAM address. |
value XC symbol |
External label, value is the label's extern ID1. |
value XD symbol |
External variable, value is the variable's extern ID. |
value LC |
Code length, value is the number of ROM words required. |
value LD |
Data length, value is the number of RAM words required. |
Relocation Information |
value X |
External address, value is the extern ID for a label or variable. |
value C |
Code address, value is relocated relative to this object's allocated2 ROM address. |
value D |
Data address, value is relocated relative to this object's allocated RAM address. |
|
1 extern ID is a unique value for each .extern in the source file.
|
2 allocated addresses are the starting addresses for the object's code and data. They are determined by the linker after it reads all the object headers.
|
The linkage information is written as a header at the beginning of the object file. This allows the linker to quickly read the info without reading the entire file. For the same reason, the header should include the required ROM and RAM sizes, although these sizes could be determined by reading and parsing the entire file. The order of the linkage records within the header is unimportant.
Address relocation will be discussed in the next section.
Figure 1:
Sample source and resulting relocatable object file
|
// File: Tetra.asm
//
// Set RAM[0..9] to the first 10
// tetrahedral numbers:
// 1, 4, 10, 20, 35, 56, 84, 120,
// 165, 220.
.public (TETRA)
.extern (LSUM)
.extern LSUM_n
.extern LSUM_p
.extern (__HALT__)
(TETRA)
// Make list of 1's.
@10 // p = 10
D=A
@p
M=D
(LOOP) // do
@p // *--p = 1
AM=M-1
M=1
D=A // while p > 0
@LOOP
D;JGT
// Do Linear Sum 3 times to make
// tetrahedral numbers.
@3 // i = 3
D=A
@i
M=D
(LOOP2) // do {
@10 // call LSUM(n=10, p=0)
D=A
@LSUM_n
M=D
@LSUM_p
M=0
@RIP
D=A
@LSUM
0;JMP
(RIP)
@i // } while --i > 0
MD=M-1
@LOOP2
D;JGT
@__HALT__
0;JMP
|
|
Tetra.hackr
0000000000000000 XC LSUM Linkage header
0000000000000001 XD LSUM_n
0000000000000010 XD LSUM_p
0000000000000000 PC TETRA
0000000000000011 XC __HALT__
0000000000011110 LC
0000000000000010 LD
0000000000001010 First instruction
1110110000010000
0000000000000000 D Relocatable data address: @p
1110001100001000
0000000000000000 D
1111110010101000
1110111111001000
1110110000010000
0000000000000100 C Relocatable code address: @LOOP
1110001100000001
0000000000000011 Constant value: @3
1110110000010000
0000000000000001 D Relocatable data address: @i
1110001100001000
0000000000001010
1110110000010000
0000000000000001 X External [data] reference, ID = 1: LSUM_n
1110001100001000
0000000000000010 X
1110101010001000
0000000000011000 C Relocatable code address: @RIP
1110110000010000
0000000000000000 X External [code] reference, ID = 0: LSUM
1110101010000111
0000000000000001 D
1111110010011000
0000000000001110 C
1110001100000001
0000000000000011 X
1110101010000111
|
Link Editor
The
link editor (or linker) combines the relocatable object files into a single memory image (.hack) file. This is a two pass operation like the assembler. The first pass reads the linkage and size information from the headers in the object files and computes the starting ROM and RAM addresses for each object file. Given this information, the object's code and data can be allocated in ROM and RAM, and the final locations for all the public symbols will be known.
The second pass copies the program code from the object files to the executable file, supplying all the external link addresses and relocating all the internal memory addresses using the information collected in the first pass.
There is a small issue of where the program entry point is. With the original Assembler, the first instruction in the ASM file is the first instruction executed. Now that multiple ASM files are involved, The entry point can be handled in a couple of ways.
- Do nothing. The program will begin execution with first instruction in the first object file that was processed.
- Write a bootstrap jump to _ _MAIN_ _ as the first instruction(s) in the output file. One of the object files must have a public symbol named _ _MAIN_ _.
- If _ _MAIN_ _ is at offset 0 in one of the object files, the jump instruction can be eliminated by locating that file as the first object in the output file. This requires a bit of work between pass 1 and pass 2 to adjust the starting object code addresses that were determined during pass 1.
The jump instruction is insignificant compared to the size of a typical program so I chose not to rearrange the objects. I do skip writing the jump if the entry point is offset 0 in the first object file. I include command options to rename the entry point label and to entirely eliminate the jump regardless of entry point location.
Figure 2:
Object files relocated into ROM image and Linkage map from my linker
|
LSum.hackr |
code |
data |
21 words |
3 words |
Main.hackr |
code |
data |
4 words |
0 words |
Tetra.hackr |
code |
data |
30 words |
2 words |
| |
LSum.hack |
code |
data |
0 |
(2) |
<boot> |
16 |
(0) |
<boot> |
2 |
(21) |
LSum |
16 |
(3) |
LSum |
23 |
(4) |
Main |
19 |
(0) |
Main |
27 |
(30) |
Tetra |
19 |
(2) |
Tetra |
57 |
|
|
21 |
|
|
| |
[D:/TECS/projects/bin/rasm/example]
% hlink -v -o Tetra.hack *.hackr
File LSum.hackr:
code segment 2 L 21
15 LSUM Tetra.hackr
data segment 16 L 3
17 LSUM_n Tetra.hackr
16 LSUM_p Tetra.hackr
File Main.hackr:
code segment 23 L 4
26 __HALT__ LSum.hackr Tetra.hackr
23 __MAIN__ <ENTRY POINT>
data segment 19 L 0
File Tetra.hackr:
code segment 27 L 30
27 TETRA Main.hackr
data segment 19 L 2
Code size = 57 (0x0039)
Data size = 21 (0x0015)
|
Chaos ensues...
I have been experimenting with a more complex project than the simple example, and had quite a lot of trouble debugging the linked executable. The Hack Assembler's automatic allocation of RAM variables interacts very badly with missing .extern declarations for code labels. I missed a few of the .externs in several files and ended up with calls that jumped to RAM addresses. After finding the second wild jump, I decided that automatic variable allocation is not compatible with multiple source file usage.
I added a ".data" command to explicitly allocate RAM variables. At the same time I allowed multiple symbols on all the "." commands. Tetra.asm now starts like this:
// File: Tetra.asm
// Set RAM[0..9] to the first 10 tetrahedral numbers:
// 1, 4, 10, 20, 35, 56, 84, 120, 165, 220.
.public (TETRA)
.extern (LSUM), LSUM_n, LSUM_p // from LSum.asm
.extern (__HALT__) // from Main.asm
.data p, i
(TETRA)
// Make list of 1's.
@10 // p = 10
D=A
@p
M=D
Linking Pong
The "more complex project" mentioned in the previous section is Pong, linked with a set of individually compiled OS object files.
I wrote some Python scripts to split the .asm files generated by my VM Translator into individual .asm files. At simpler optimization level, the only common code routines that my VM translator uses are call, return, and compare. These common routines and the bootstrap are split to their own file, VM.asm. This results in the linkable OS library sources:
Array.asm, Keyboard.asm, Math.asm, Memory.asm, Output.asm, Screen.asm, String.asm, Sys.asm, VM.asm
Using the same splitter on Pong generates its sources:
Ball.asm, Bat.asm, Main.asm, PongGame.asm
Pong builds successfully, and it runs!!!
[D:/TECS/projects/bin/rasm/Pong]
% ../rasm *.asm
Ball.asm
C: 1941, D: 0
Bat.asm
C: 879, D: 0
Main.asm
C: 82, D: 0
PongGame.asm
C: 1674, D: 1
[D:/TECS/projects/bin/rasm/Pong]
% ../hlink -v -o Pong.hack *.hackr OS/*.hackr
read Ball.hackr
read Bat.hackr
read Main.hackr
read PongGame.hackr
read OS\Array.hackr
read OS\Keyboard.hackr
read OS\Math.hackr
read OS\Memory.hackr
read OS\Output.hackr
read OS\VM.hackr
read OS\Screen.hackr
read OS\String.hackr
read OS\Sys.hackr
write Pong.hack
Code size = 23479 (0x5BB7)
Data size = 30 (0x001E)
| |
|
Here's a portion of the link map. The map shows the address for all the publics, and a list of files that reference that public. The number in () is the number of references to that public from the named file.
You can see that the initial jump in the screen shot is going to __MAIN__ in VM.asm which is the bootstrap code. The bootstrap calls (Sys.init). As expected, VM.asm is the only reference to (Sys.init).
File <ENTRY POINT>:
code segment 2 L 0
data segment 16 L 0
File ball.hackr:
code segment 2 L 1941
2 Ball.new ponggame.hackr(1)
148 Ball.dispose ponggame.hackr(1)
181 Ball.show
234 Ball.hide
287 Ball.draw
353 Ball.getLeft ponggame.hackr(1)
367 Ball.getRight ponggame.hackr(1)
386 Ball.setDestination ponggame.hackr(1)
789 Ball.move ponggame.hackr(1)
1201 Ball.bounce ponggame.hackr(1)
data segment 16 L 0
File bat.hackr:
code segment 1943 L 879
1943 Bat.new ponggame.hackr(1)
2033 Bat.dispose ponggame.hackr(1)
2066 Bat.show
2119 Bat.hide
2172 Bat.draw
2243 Bat.setDirection ponggame.hackr(2)
2263 Bat.getLeft ponggame.hackr(1)
2277 Bat.getRight ponggame.hackr(1)
2298 Bat.setWidth ponggame.hackr(1)
2360 Bat.move ponggame.hackr(2)
data segment 16 L 0
. . . . . . . . . . . . . . . . . . . . . . . .
File os\math.hackr:
code segment 5189 L 1453
5189 Math.init os\sys.hackr(1)
5372 Math.abs ball.hackr(2) os\screen.hackr(2)
5404 Math.multiply ball.hackr(11) os\output.hackr(3) os\screen.hackr(13) os\string.hackr(2)
5743 Math.divide ball.hackr(6) os\output.hackr(1) os\screen.hackr(5) os\string.hackr(1)
6366 Math.sqrt
6566 Math.max os\screen.hackr(2)
6604 Math.min os\screen.hackr(2)
data segment 17 L 2
. . . . . . . . . . . . . . . . . . . . . . . .
File os\sys.hackr:
code segment 22944 L 349
22944 Sys.init os\vm.hackr(1)
23062 Sys.halt
23077 Sys.wait ponggame.hackr(2)
23182 Sys.error os\array.hackr(1) os\math.hackr(2) os\memory.hackr(2) os\output.hackr(1)
os\screen.hackr(5) os\string.hackr(7)
data segment 30 L 0
File os\vm.hackr:
code segment 23293 L 186
23293 __MAIN__
23320 $CALL$ ball.hackr(30) bat.hackr(18) main.hackr(4) os\array.hackr(3)
os\keyboard.hackr(16) os\math.hackr(9) os\memory.hackr(2) os\output.hackr(122)
os\screen.hackr(49) os\string.hackr(16) os\sys.hackr(13) ponggame.hackr(52)
23365 $CMP_JLT$ ball.hackr(10) bat.hackr(1) os\math.hackr(13) os\memory.hackr(3)
os\output.hackr(7) os\screen.hackr(22) os\string.hackr(8) os\sys.hackr(1)
ponggame.hackr(2)
23404 $CMP_JGT$ ball.hackr(2) bat.hackr(1) os\array.hackr(1) os\keyboard.hackr(2)
os\math.hackr(10) os\memory.hackr(3) os\output.hackr(4) os\screen.hackr(18)
os\string.hackr(6) os\sys.hackr(2) ponggame.hackr(3)
23430 $CMP_JEQ$ ball.hackr(6) bat.hackr(1) os\keyboard.hackr(3) os\math.hackr(2)
os\memory.hackr(7) os\output.hackr(7) os\screen.hackr(2) os\string.hackr(8)
ponggame.hackr(7)
23440 $RETURN$ ball.hackr(10) bat.hackr(10) main.hackr(1) os\array.hackr(2)
os\keyboard.hackr(5) os\math.hackr(7) os\memory.hackr(5) os\output.hackr(12)
os\screen.hackr(11) os\string.hackr(13) os\sys.hackr(4) ponggame.hackr(6)
data segment 30 L 0
Code size = 23479 (0x5BB7)
Data size = 30 (0x001E)
Using my VM translator to translate Pong generates 23471 instructions. The 8 extra instructions are due to the bootstrap jump and some jumps required by moving the common code to VM.asm.
Future work
Named constants: Add ".const symbol=number" command to define constants and non-relocatable addresses. Constants can be public or external like any other symbol. There will need to be two new linkage records: PA and EA (A for absolute). Syntax for external constant will likely be the two statement sequence
.extern X_SIZE .const X_SIZE which mirrors declaring an external variable.
Include files: Add ".include" command to support a single set of externs across a project. This might have saved me from the above chaos. This will require that the parser can push its current context to read the (possibly nested) include files and then pop its context to continue with the current file.
Library search: If there are unresolved externs, search library directories (-L options, HLIB=path;...) for .hackr files that define them.
Support function level linking: The obvious scheme is to add a .function command to the ASM to mark the beginning of an independent block of code and generate linkage tables for each functions. The Assembler will need to be aware of references between functions and write public/extern linkages for them. Seems like a lot of work. Is it worth it? For Pong, it would reduce the size from 23K words to 19K — 17% of the OS code is uncalled.