I just did a quick and dirty execution profile of Pong from project 6. This shows what functions the code spends most of it's time in while it's executing. The VM_etc functions are the helper functions the VM uses to do comparison stack ops and call and return. The profile was generated by running a .tst file that output the PC every 100 instructions for a total runtime of 10 million instructions. Here's what I found:
OS initialization -- 3.9 million instructions
17656 44.9% memory.alloc
9631 24.5% math.multiply
4831 12.3% VM_LT
1790 4.6% output.createshiftedmap
1508 3.8% VM_RETURN
1411 3.6% VM_CALL
946 2.4% VM_GT
649 1.7% math.abs
585 1.5% output.create
88 0.2% array.new
76 0.2% output.initmap
53 0.1% VM_EQ
28 0.1% screen.init
27 0.1% math.init
1 0.0% string.new
1 0.0% output.init
1 0.0% memory.init
ponggame.newinstance -- 1.1 million instructions
7619 69.5% screen.clearscreen
1508 13.8% VM_LT
691 6.3% memory.alloc
228 2.1% output.drawchar
150 1.4% screen.drawrectangle
144 1.3% math.divide
141 1.3% screen.updatelocation
130 1.2% math.multiply
119 1.1% VM_RETURN
111 1.0% VM_CALL
30 0.3% VM_GT
13 0.1% output.printchar
13 0.1% math.abs
10 0.1% string.charat
9 0.1% output.getmap
8 0.1% string.appendchar
7 0.1% output.printstring
7 0.1% VM_EQ
5 0.0% ponggame.new
5 0.0% ball.setdestination
2 0.0% output.movecursor
2 0.0% bat.draw
2 0.0% ball.new
1 0.0% string.new
1 0.0% bat.new
1 0.0% ball.show
1 0.0% ball.draw
ponggame.run -- 5 million instructions
16566 33.3% math.divide
9808 19.7% math.multiply
6111 12.3% screen.drawrectangle
3469 7.0% VM_RETURN
3238 6.5% VM_CALL
3118 6.3% VM_GT
2616 5.3% screen.updatelocation
2400 4.8% VM_LT
964 1.9% math.abs
425 0.9% VM_EQ
259 0.5% bat.move
257 0.5% ball.move
135 0.3% ball.draw
97 0.2% ponggame.moveball
79 0.2% ponggame.run
60 0.1% screen.setcolor
53 0.1% ball.show
50 0.1% ball.hide
24 0.0% memory.peek
17 0.0% keyboard.keypressed
6 0.0% ball.setdestination
5 0.0% ball.bounce
1 0.0% ball.getright
Although it isn't obvious from these views of the profiling, most of the OS initialization time is spent generating the display character bitmaps, allocating RAM for each character individually -- effectively just moving numbers from ROM to RAM because there is no way to read the bitmaps out of ROM in the Harvard architecture.
--Mark