
This is frustrating me quite a bit. My first attempt at the Math.jack class, with no optimization (i.e. lots of multiplication calculations, etc) required 1.2M vmsteps to complete the .tst file. I figured I was doing something wrong since 1.0M vmsteps were allotted by default, and through trial and error I found that the supplied Math.vm file only needs around 600k vmsteps.
So then I heavily optimized my class, spent several hours on it removing all multiplications, tabulating, saving, reusing values wherever possible. My savings? Almost nothing. Still needed 1.1M vmsteps.
Here's where it gets really weird. I truncated the .tst file down to ONE simple test (an absolute value calculation). I STILL need more than 1.0M vmsteps to even do this. A similar thing happens with the supplied Math.vm file... it will succeed with 600k vmsteps (but fail with 500k vmsteps) whether I am doing 1 of the 14 tests, or all 14 of them. That makes no sense to me. It implies that the majority of the computation happening is to do with the output, not the actual math operations, which would seem counterintuitive. You would think that running 14 tests would take 14x as many vmsteps compared to 1 test.
If I try to execute the test with only 1.0M vmsteps, using my Math.vm, the program halts with the call stack as follows:
Sys.init
Output.init
Output.createShiftedMap
Math.multiply
Math.bit
Now, I'm not even sure what createShiftedMap is, since it is not something in our own OS contract. I'm guessing it's made just for outputting the .out files. But obviously it is calling my Math.multiply, and obviously that is where the performance hit is coming from. Which I can't comprehend... since both my multiply and bit functions should be O(n)... I have only +, &, and = operators in each of these functions, and the multiply function has one loop of size n, while the bit function has no loops. I can only guess that the createShiftedMap is calling my multiply function a HECK of a lot of times or something. I haven't tried watching the call stack for an entire program run, I think that would take forever.
Even if createShiftedMap weren't using so many vmsteps, it is still clear comparing my 1.1M vmsteps needed vs the builtin version's 600k that there is still some inefficiency in my multiply function... 1.1M/600k=~2 means I should be able to halve the instructions required by mine, but I don't see how. I could post the code here but not sure that is kosher. I am so frustrated I may try decompiling the Math.vm supplied as suggested elsewhere in this forum... but that sounds painstaking.
Thanks!
