|
12
|
Hello,
I have a general question to the VMTranslator. I implemted it in a straight forward way, but realizid that the generated VM-Code gets really big. So I made some obvious improvents: For example it is possible to store the code for call- and return-commands in global (assembler) functions, like:
...
@VM.INTERN.<unique-nr> D=A
@VM.INTERN.GLOBALFUNC
0;JMP
(VM.INTERN.<unique-nr>)
...
the global function can then use the D register and save the ret. address. But I think this technique is not explained in the book. However although I use this optimized code, the OS.vm files compile to
~28.000 LOC (!) (i.e. the OS needs the whole ROM space)
I had a look at Pong.asm (of project 6) and analyzed it's source (reverse-engineering ;) ) and it seems that this code is generated by a really clever vm-translator and its code length is only
~28.000 LOC (but INCLUSIVE Pong)
So did this translator use some clever heuristics (I assume so), beyond the techniques mentioned in the book?
best regards,
uli kastlunger
|
|
Just adding my experiences..
When I wrote the program to translate vm commands to assembly it seemed like a good idea to have functions such as Get_D_fromStack, Get_M_fromStack, Put_D_onStack etc to return the assembly code for these actions.
Then, for example, my function to translate the VM command 'ADD' might involve calls to these 3 functions and just enough code to supply whatever extra assembly code was required (just D=D+M in this case).
This, while making my translator program look neat, resulted in a lot more lines of assembly than were necessary.
By writing a function to implement vm commands directly in assembly without any calls to other functions the number of resultant lines was reduced by more than 50%.
EDITED to improve clarity
|
|
Thanks for your response. I experienced similar issues. It seems that a clean, short translator results in many generated assembly code lines. On the contrary a highly sophisticates translator is able to generate less code.
|
|
After some optimization tasks I managed to reduce the produced code size to 25669 lines (which leads to 24923 words in memory, because the assembler will skip (...) tags).
:)
|
|
Hello everyone.
I've got really stuck on the subject of optimizing my VM translator.
Now it generates 33172 lines of binary code, which is still too many.
I've already done optimization mentioned by Uli Kastlunger. I applied it to
eq, lt, gt, pop, call, return commands but that's all vainly.
Could you advice me?
I thought about an optimization, though I'm sure it won't help much:
When the code is generated, let's run through it once more and remove all
unnecessary @-commands. I.e.:
// One command...
//...
//...
@SP
M=M+1
// And another. Here @SP is unnecessary, but my translator
// pays no attention to such things.
@SP
// ...
// ...
I can't invent any other method.
Thank you.
Dmitriy.
|
Administrator
|
Dmitriy,
Please become a registered user and then we can exchange private mail.
As a first idea where to look to improve your VM translation, here is the breakdown of the number of various VM commands used in the OS.
command | count |
push [all types] | 2177 |
push constant | 1489 |
pop [all types] | 535 |
push local | 274 |
call | 232 |
push argument | 228 |
pop temp | 207 |
label | 207 |
goto | 207 |
add | 166 |
pop local | 129 |
pop pointer | 99 |
if-goto | 93 |
lt | 75 |
or | 70 |
push static | 65 |
return | 59 |
function | 58 |
sub | 52 |
not | 50 |
gt | 48 |
push that | 47 |
push temp | 42 |
pop that | 42 |
pop static | 31 |
push this | 28 |
eq | 24 |
pop argument | 19 |
and | 17 |
neg | 12 |
pop this | 8 |
push pointer | 4 |
--Mark
|
|
Nice list.
I have managed to compile the OS into 24834 instructions in asm and 23755 instructions in hack. Compares pretty well with the results of Uli Kastrunger.
My general method consists of trying to avoid repeating the same code as much as possible. Storing my state in the ROM memory and jumping to earlier assembled code segments/functions cost six instructions in hack.
So this is my strategy for commands with more than six hack instructions:
The first time I use a new command I just put in the code. The next time the command appears I jump to the earlier coded segmemt, execute and jump back.
I would like to thank cadet1620 for his answers in project 7. I managed to change my code until I reached your bench marks.
Would really appreciate if a similar bench mark could be given for the OS! That would give us happy amateurs/students something to strive towards.
|
|
I've seen a lot of posts recently about instruction count and that is fine up to a point. It is fun to puzzle something out to a set of constraints. I would only suggest the following be kept in mind as a student of software engineering.
Code is designed for more than one purpose, simultaneously. Everyone knows code is designed for machines to repeat a set of instructions. People are generally less aware that code is intended to communicate with people. Writing code is writing first and code second. Your code should help the reader understand what is happening. It should be clear, concise, and easy to follow.
Optimization is evil. It is fun, but it obfuscates design, algorithm, and purpose. In general, optimization should only occur when there is a demonstrated need, and then, only to the extent required to meet that need.
Keep in mind this is a learning exercise. Personally, I think your code will be more valuable to you in the future if you can come back to it and quickly understand the contained concepts than if you come back to a fully optimized version that is as tiny as possible. At least, keep an un-optimized version around for future reference.
http://en.wikipedia.org/wiki/Program_optimization#When_to_optimizeJust my two cents.
|
Administrator
|
Fredrik Forsberg wrote
Nice list.
I have managed to compile the OS into 24834 instructions in asm and 23755 instructions in hack. Compares pretty well with the results of Uli Kastrunger.
...
I would like to thank cadet1620 for his answers in project 7. I managed to change my code until I reached your bench marks.
[See this post. --M]
Would really appreciate if a similar bench mark could be given for the OS! That would give us happy amateurs/students something to strive towards.
Nice job on your VM translator.
Translating and assembling the supplied OS .vm files I get
File | Lines |
asm | 23621 |
hack | 22887 |
--Mark
|
Administrator
|
milythael wrote
I've seen a lot of posts recently about instruction count and that is fine up to a point. It is fun to puzzle something out to a set of constraints. I would only suggest the following be kept in mind as a student of software engineering.
Code is designed for more than one purpose, simultaneously. Everyone knows code is designed for machines to repeat a set of instructions. People are generally less aware that code is intended to communicate with people. Writing code is writing first and code second. Your code should help the reader understand what is happening. It should be clear, concise, and easy to follow.
Note that in the case of chapters 7 ans 8 we are talking about machine generated code that is the output of the VM translator [compiler back end]. Without optimization the final output from compiling a Jack program will not fit in the Hack program memory.
I absolutely agree with milythael that source code must be human readable. My current work project involves integrating code from another programmer's past project into a product I've been developing. My next project will involve working on some code that I wrote about 10 years ago that needs some new functionality.
--Mark
|
|
My toolchain compiles original OS .vm files to 17523 hack instructions right now. This is achieved mostly by eliminating SP increments and decrements. Almost all VM operations have 4 flavours, depending on whether they fetch or store argument from stack. Next, compare operation merges nice with following if-goto operation. Binary operations can be converted to unary, when one of the argument is constant. Sequence push constant,add can thus sometimes be translated to
@constant
D=D+A
I am working on a jack to vm compiler now, maybe I will manage to squeze that even more.
What seems to be more important to me is the execution speed, rather than code size (as long as it fits into memory). I am therefore not that eager to factor out code of 6 or more instructions and jump there. I do it with call and return, which are really beasts, but not with compare.
|
|
This is interesting.
So basically, one could read more than one row at a time from the vm-file, understand what it does and make a special cases from that to optimize the vm-translation more.
Seems to make sense! It leaves the simplicity of the translator described in the book behind, though, so I will probably continue with the simpler version I have now until I've finished the Jack-translator.
Thanks for all tips!
|
|
This post was updated on .
Hi Dano,
my implementation of a few of your suggestions, namely if-goto following comparison and add/sub following push constant, only saved me about 350 lines. Overall I am down to 23145 lines for the naked OS files; however, jumping to call and return routines made a huge difference.
I don't understand what you mean by saying you got rid of SP increments and decrements. How did you do that?
Regards,
Peter
---Edit---
I think, I now have an idea of what you mean by this: avoiding SP increments (by a push) when the consequent command will decrease SP (e.g. most of the arithmetic commands and the if-goto statement). I have done this for all push statements followed by add/sub, now, resulting in ~22300 lines in the assembly code for the OS vm-files.
At this point, however, I cannot imagine to lose another 5000 lines, as the savings seem to matter less and less.
|
|
Hi,
My VM translator currently translates tools/OS/*.vm down to 20565 assembly instructions
(not counting label lines and comments).
The project/11/Pong/ code, together with the OS, translates to 25679 instructions.
The translator does various peephole optimizations, including optimizing the following sorts of
sequences:
push SEG1 OFF1 ; pop SEG2 OFF2
push SEG OFF ; push constant VAL ; add
push SEG OFF ; push constant VAL ; sub
sequences producing a value followed by unary operation (neg or not)
Also, it uses special case code for various small constants like 0, 1, 0xffff, and sometimes others,
and it uses library routines to support function call and correct signed comparisons.
Of course, one could get the code MUCH smaller (though slower) if one were willing to implement an actual VM interpreter in Hack assembly, rather than just translate the VM code into hack assembly.
If it would be of interest and in keeping with the class rules, I could post my PongGame.asm file to pastebin.com...
|
Administrator
|
I've been poking at optimization off and on in spare time and am now down to 19624 words for the supplied OS.vm files. Dano's still in the lead by about 10%. He's ahead by about 1/2 instruction per VM command. I suspect that there's something common he figured out how to do one instruction shorter than I did...
We're averaging 4.5 to 5 instructions per VM command. It's going to be tough to beat that with an interpreter since the binary VM commands will need to be stored in ROM. You'll need a lot of code that looks something like
@15-bit data with 2 binary commands
D=A
@R15
M=D
@RIP
D=A
@interpreter
0;JMP
(RIP)
You'll need to pack binary VM commands at least 2-to-1.
--Mark
|
|
What I had in mind for an interpreter was a "Threaded Interpretive" code format like that used in some Forth implementations, in which each 16-bit word encodes either a constant push (say, if its high-order bit is zero), or the ROM address of a function to call (once you mask off the high-order bit). When the interpreter decodes one of these ROM address instructions, it pushes the subsequent address in a "register", and jumps to the address to start executing machine code.
For small built-in functions of the VM, this machine code performs the specified operation and when done calls back to the interpreter to continue interpreting where it left off. Some more complicated operations,
including user application function calls, involve pushing the return address on the/a stack and re-entering the interpreter at a new VM-code subroutine.
I think with this scheme, most VM operations are one 16-bit word (constant pushes, add, sub, and, or, eq, lt, gt, not, neg), two 16-bit words (general pushes and pops, goto, if-goto, return, function entry), or 3 16-bit words (function call).
Of course, to interpret each of these 16-bit VM-code words words the interpreter executes many hack instructions...
This would be a solution aimed at squeezing a lot of code into the available 32K words of ROM, not aimed at performance.
|
Administrator
|
The problem is that on the Hack computer the VM interpreter cannot read the instructions that it is supposed to interpret from ROM. Somehow the binary VM code needs to get into RAM.
It occurs to me that you could write tighter code to copy the instructions to RAM using something like
@ 15-bit instruction or ~instruction if the MSB should be set
D=A or D=~A
@destRamPtr
AM=M+1
M=D
The added encoding efficiency of the threaded interpreter might be enough to reduce the size of larger programs.
--Mark
|
|
> The problem is that on the Hack computer the VM interpreter cannot read the instructions that it is supposed to interpret from ROM
Doh! That's what I was missing.... you would definitely need architecture changes then to do what
I wanted.
|
Administrator
|
dlk wrote
> The problem is that on the Hack computer the VM interpreter cannot read the instructions that it is supposed to interpret from ROM
Doh! That's what I was missing.... you would definitely need architecture changes then to do what
I wanted.
See this post Hack II: Escaping the Harvard straitjacket.
--Mark
|
12
|