Nand2Tetris Questions and Answers Forum › Virtual Machine › Project 8

Generated code size

Classic

List

Threaded

31 messages Options

Uli Kastlunger

Generated code size

Hello,

I have a general question to the VMTranslator. I implemted it in a straight forward way, but realizid that the generated VM-Code gets really big. So I made some obvious improvents: For example it is possible to store the code for call- and return-commands in global (assembler) functions, like:

...
@VM.INTERN.<unique-nr>
D=A
@VM.INTERN.GLOBALFUNC
0;JMP
(VM.INTERN.<unique-nr>)
...

the global function can then use the D register and save the ret. address. But I think this technique is not explained in the book. However although I use this optimized code, the OS.vm files compile to

~28.000 LOC (!) (i.e. the OS needs the whole ROM space)

I had a look at Pong.asm (of project 6) and analyzed it's source (reverse-engineering ;) ) and it seems that this code is generated by a really clever vm-translator and its code length is only

~28.000 LOC (but INCLUSIVE Pong)

So did this translator use some clever heuristics (I assume so), beyond the techniques mentioned in the book?

best regards,
uli kastlunger

culchie

Re: Generated code size

Just adding my experiences..
When I wrote the program to translate vm commands to assembly it seemed like a good idea to have functions such as Get_D_fromStack, Get_M_fromStack, Put_D_onStack etc to return the assembly code for these actions.
Then, for example, my function to translate the VM command 'ADD' might involve calls to these 3 functions and just enough code to supply whatever extra assembly code was required (just D=D+M in this case).
This, while making my translator program look neat, resulted in a lot more lines of assembly than were necessary.
By writing a function to implement vm commands directly in assembly without any calls to other functions the number of resultant lines was reduced by more than 50%.

EDITED to improve clarity

Uli Kastlunger

Re: Generated code size

Thanks for your response. I experienced similar issues. It seems that a clean, short translator results in many generated assembly code lines. On the contrary a highly sophisticates translator is able to generate less code.

Uli Kastlunger

Re: Generated code size

After some optimization tasks I managed to reduce the produced code size to 25669 lines (which leads to 24923 words in memory, because the assembler will skip (...) tags).

:)

Dmitriy

Re: Generated code size

Hello everyone.

I've got really stuck on the subject of optimizing my VM translator.
Now it generates 33172 lines of binary code, which is still too many.

I've already done optimization mentioned by Uli Kastlunger. I applied it to
eq, lt, gt, pop, call, return commands but that's all vainly.

Could you advice me?

I thought about an optimization, though I'm sure it won't help much:
When the code is generated, let's run through it once more and remove all
unnecessary @-commands. I.e.:

// One command...
//...
//...
@SP
M=M+1
// And another. Here @SP is unnecessary, but my translator
// pays no attention to such things.
@SP
// ...
// ...

I can't invent any other method.

Thank you.
Dmitriy.

cadet1620

Re: Generated code size

Administrator

Dmitriy,

Please become a registered user and then we can exchange private mail.

As a first idea where to look to improve your VM translation, here is the breakdown of the number of various VM commands used in the OS.

command count

push [all types] 2177

push constant 1489

pop [all types] 535

push local 274

call 232

push argument 228

pop temp 207

label 207

goto 207

add 166

pop local 129

pop pointer 99

if-goto 93

lt 75

or 70

push static 65

return 59

function 58

sub 52

not 50

gt 48

push that 47

push temp 42

pop that 42

pop static 31

push this 28

eq 24

pop argument 19

and 17

neg 12

pop this 8

push pointer 4

--Mark

Fredrik Forsberg

Re: Generated code size

Nice list.

I have managed to compile the OS into 24834 instructions in asm and 23755 instructions in hack. Compares pretty well with the results of Uli Kastrunger.

My general method consists of trying to avoid repeating the same code as much as possible. Storing my state in the ROM memory and jumping to earlier assembled code segments/functions cost six instructions in hack.

So this is my strategy for commands with more than six hack instructions:
The first time I use a new command I just put in the code. The next time the command appears I jump to the earlier coded segmemt, execute and jump back.

I would like to thank cadet1620 for his answers in project 7. I managed to change my code until I reached your bench marks.

Would really appreciate if a similar bench mark could be given for the OS! That would give us happy amateurs/students something to strive towards.

milythael

Re: Generated code size

I've seen a lot of posts recently about instruction count and that is fine up to a point. It is fun to puzzle something out to a set of constraints. I would only suggest the following be kept in mind as a student of software engineering.

Code is designed for more than one purpose, simultaneously. Everyone knows code is designed for machines to repeat a set of instructions. People are generally less aware that code is intended to communicate with people. Writing code is writing first and code second. Your code should help the reader understand what is happening. It should be clear, concise, and easy to follow.

Optimization is evil. It is fun, but it obfuscates design, algorithm, and purpose. In general, optimization should only occur when there is a demonstrated need, and then, only to the extent required to meet that need.

Keep in mind this is a learning exercise. Personally, I think your code will be more valuable to you in the future if you can come back to it and quickly understand the contained concepts than if you come back to a fully optimized version that is as tiny as possible. At least, keep an un-optimized version around for future reference.

http://en.wikipedia.org/wiki/Program_optimization#When_to_optimize

Just my two cents.

cadet1620

Re: Generated code size

Administrator

In reply to this post by Fredrik Forsberg

Fredrik Forsberg wrote

Nice list.
I have managed to compile the OS into 24834 instructions in asm and 23755 instructions in hack. Compares pretty well with the results of Uli Kastrunger.
...
I would like to thank cadet1620 for his answers in project 7. I managed to change my code until I reached your bench marks.
[See this post. --M]
Would really appreciate if a similar bench mark could be given for the OS! That would give us happy amateurs/students something to strive towards.

Nice job on your VM translator.

Translating and assembling the supplied OS .vm files I get

File	Lines
asm	23621
hack	22887

--Mark

cadet1620

Re: Generated code size

Administrator

In reply to this post by milythael

milythael wrote

I've seen a lot of posts recently about instruction count and that is fine up to a point. It is fun to puzzle something out to a set of constraints. I would only suggest the following be kept in mind as a student of software engineering.

Code is designed for more than one purpose, simultaneously. Everyone knows code is designed for machines to repeat a set of instructions. People are generally less aware that code is intended to communicate with people. Writing code is writing first and code second. Your code should help the reader understand what is happening. It should be clear, concise, and easy to follow.

Note that in the case of chapters 7 ans 8 we are talking about machine generated code that is the output of the VM translator [compiler back end]. Without optimization the final output from compiling a Jack program will not fit in the Hack program memory.

I absolutely agree with milythael that source code must be human readable. My current work project involves integrating code from another programmer's past project into a product I've been developing. My next project will involve working on some code that I wrote about 10 years ago that needs some new functionality.

--Mark

Dano

Re: Generated code size

In reply to this post by Fredrik Forsberg

My toolchain compiles original OS .vm files to 17523 hack instructions right now. This is achieved mostly by eliminating SP increments and decrements. Almost all VM operations have 4 flavours, depending on whether they fetch or store argument from stack. Next, compare operation merges nice with following if-goto operation. Binary operations can be converted to unary, when one of the argument is constant. Sequence push constant,add can thus sometimes be translated to

@constant
D=D+A

I am working on a jack to vm compiler now, maybe I will manage to squeze that even more.

What seems to be more important to me is the execution speed, rather than code size (as long as it fits into memory). I am therefore not that eager to factor out code of 6 or more instructions and jump there. I do it with call and return, which are really beasts, but not with compare.

SSA-based jack2asm compiler

Fredrik Forsberg

Re: Generated code size

This is interesting.

So basically, one could read more than one row at a time from the vm-file, understand what it does and make a special cases from that to optimize the vm-translation more.

Seems to make sense! It leaves the simplicity of the translator described in the book behind, though, so I will probably continue with the simpler version I have now until I've finished the Jack-translator.

Thanks for all tips!

Peter Dietrich

Re: Generated code size

This post was updated on .

In reply to this post by Dano

Hi Dano,

my implementation of a few of your suggestions, namely if-goto following comparison and add/sub following push constant, only saved me about 350 lines. Overall I am down to 23145 lines for the naked OS files; however, jumping to call and return routines made a huge difference.
I don't understand what you mean by saying you got rid of SP increments and decrements. How did you do that?

Regards,
Peter

---Edit---

I think, I now have an idea of what you mean by this: avoiding SP increments (by a push) when the consequent command will decrease SP (e.g. most of the arithmetic commands and the if-goto statement). I have done this for all push statements followed by add/sub, now, resulting in ~22300 lines in the assembly code for the OS vm-files.
At this point, however, I cannot imagine to lose another 5000 lines, as the savings seem to matter less and less.

dlk

Re: Generated code size

In reply to this post by cadet1620

Hi,

My VM translator currently translates tools/OS/*.vm down to 20565 assembly instructions
(not counting label lines and comments).
The project/11/Pong/ code, together with the OS, translates to 25679 instructions.

The translator does various peephole optimizations, including optimizing the following sorts of
sequences:

push SEG1 OFF1 ; pop SEG2 OFF2
push SEG OFF ; push constant VAL ; add
push SEG OFF ; push constant VAL ; sub
sequences producing a value followed by unary operation (neg or not)

Also, it uses special case code for various small constants like 0, 1, 0xffff, and sometimes others,
and it uses library routines to support function call and correct signed comparisons.

Of course, one could get the code MUCH smaller (though slower) if one were willing to implement an actual VM interpreter in Hack assembly, rather than just translate the VM code into hack assembly.

If it would be of interest and in keeping with the class rules, I could post my PongGame.asm file to pastebin.com...

dlk

Re: Generated code size

In reply to this post by Dano

Cool!

cadet1620

Re: Generated code size

Administrator

In reply to this post by dlk

I've been poking at optimization off and on in spare time and am now down to 19624 words for the supplied OS.vm files. Dano's still in the lead by about 10%. He's ahead by about 1/2 instruction per VM command. I suspect that there's something common he figured out how to do one instruction shorter than I did...

We're averaging 4.5 to 5 instructions per VM command. It's going to be tough to beat that with an interpreter since the binary VM commands will need to be stored in ROM. You'll need a lot of code that looks something like

    @15-bit data with 2 binary commands
    D=A
    @R15
    M=D
    @RIP
    D=A
    @interpreter
    0;JMP
    (RIP)

You'll need to pack binary VM commands at least 2-to-1.

--Mark

dlk

Re: Generated code size

What I had in mind for an interpreter was a "Threaded Interpretive" code format like that used in some Forth implementations, in which each 16-bit word encodes either a constant push (say, if its high-order bit is zero), or the ROM address of a function to call (once you mask off the high-order bit). When the interpreter decodes one of these ROM address instructions, it pushes the subsequent address in a "register", and jumps to the address to start executing machine code.
For small built-in functions of the VM, this machine code performs the specified operation and when done calls back to the interpreter to continue interpreting where it left off. Some more complicated operations,
including user application function calls, involve pushing the return address on the/a stack and re-entering the interpreter at a new VM-code subroutine.
I think with this scheme, most VM operations are one 16-bit word (constant pushes, add, sub, and, or, eq, lt, gt, not, neg), two 16-bit words (general pushes and pops, goto, if-goto, return, function entry), or 3 16-bit words (function call).

Of course, to interpret each of these 16-bit VM-code words words the interpreter executes many hack instructions...
This would be a solution aimed at squeezing a lot of code into the available 32K words of ROM, not aimed at performance.

cadet1620

Re: Generated code size

Administrator

The problem is that on the Hack computer the VM interpreter cannot read the instructions that it is supposed to interpret from ROM. Somehow the binary VM code needs to get into RAM.

It occurs to me that you could write tighter code to copy the instructions to RAM using something like

    @ 15-bit instruction or ~instruction if the MSB should be set
    D=A                  or D=~A
    @destRamPtr
    AM=M+1
    M=D

The added encoding efficiency of the threaded interpreter might be enough to reduce the size of larger programs.

--Mark

dlk

Re: Generated code size

> The problem is that on the Hack computer the VM interpreter cannot read the instructions that it is supposed to interpret from ROM

Doh! That's what I was missing.... you would definitely need architecture changes then to do what
I wanted.

cadet1620

Re: Generated code size

Administrator

dlk wrote

> The problem is that on the Hack computer the VM interpreter cannot read the instructions that it is supposed to interpret from ROM

Doh! That's what I was missing.... you would definitely need architecture changes then to do what
I wanted.

See this post Hack II: Escaping the Harvard straitjacket.

--Mark