My optimized Math.jack class takes 1.1M vmsteps to complete test

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

My optimized Math.jack class takes 1.1M vmsteps to complete test

The111
This is frustrating me quite a bit.  My first attempt at the Math.jack class, with no optimization (i.e. lots of multiplication calculations, etc) required 1.2M vmsteps to complete the .tst file.  I figured I was doing something wrong since 1.0M vmsteps were allotted by default, and through trial and error I found that the supplied Math.vm file only needs around 600k vmsteps.

So then I heavily optimized my class, spent several hours on it removing all multiplications, tabulating, saving, reusing values wherever possible.  My savings?  Almost nothing.  Still needed 1.1M vmsteps.

Here's where it gets really weird.  I truncated the .tst file down to ONE simple test (an absolute value calculation).  I STILL need more than 1.0M vmsteps to even do this.  A similar thing happens with the supplied Math.vm file... it will succeed with 600k vmsteps (but fail with 500k vmsteps) whether I am doing 1 of the 14 tests, or all 14 of them.  That makes no sense to me.  It implies that the majority of the computation happening is to do with the output, not the actual math operations, which would seem counterintuitive.  You would think that running 14 tests would take 14x as many vmsteps compared to 1 test.

If I try to execute the test with only 1.0M vmsteps, using my Math.vm, the program halts with the call stack as follows:

Sys.init
Output.init
Output.createShiftedMap
Math.multiply
Math.bit

Now, I'm not even sure what createShiftedMap is, since it is not something in our own OS contract.  I'm guessing it's made just for outputting the .out files.  But obviously it is calling my Math.multiply, and obviously that is where the performance hit is coming from.  Which I can't comprehend... since both my multiply and bit functions should be O(n)... I have only +, &, and = operators in each of these functions, and the multiply function has one loop of size n, while the bit function has no loops.  I can only guess that the createShiftedMap is calling my multiply function a HECK of a lot of times or something.  I haven't tried watching the call stack for an entire program run, I think that would take forever.

Even if createShiftedMap weren't using so many vmsteps, it is still clear comparing my 1.1M vmsteps needed vs the built-in version's 600k that there is still some inefficiency in my multiply function... 1.1M/600k=~2 means I should be able to halve the instructions required by mine, but I don't see how.  I could post the code here but not sure that is kosher.  I am so frustrated I may try decompiling the Math.vm supplied as suggested elsewhere in this forum... but that sounds painstaking.

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: My optimized Math.jack class takes 1.1M vmsteps to complete test

cadet1620
Administrator
I profiled Pong.hack a while ago and found that the OS initialization took 3.9 million CPU instructions. 24.5% of the time was in math.multiply.  See this post.

IIRC, Output.createShiftedMap is going through the font table and multiplying each entry by 256 to create the font table for characters that are drawn in the upper half-words of the screen.

Feel free to send me [More|Reply to author] your Math.jack and I'll let you know what I see.

--Mark