I just did a quick and dirty execution profile of Pong from project 6. This shows what functions the code spends most of it's time in while it's executing. The VM_etc functions are the helper functions the VM uses to do comparison stack ops and call and return.  The profile was generated by running a .tst file that output the PC every 100 instructions for a total runtime of 10 million instructions.  Here's what I found:
OS initialization -- 3.9 million instructions
17656 44.9% memory.alloc
 9631 24.5% math.multiply
 4831 12.3% VM_LT
 1790  4.6% output.createshiftedmap
 1508  3.8% VM_RETURN
 1411  3.6% VM_CALL
  946  2.4% VM_GT
  649  1.7% math.abs
  585  1.5% output.create
   88  0.2% array.new
   76  0.2% output.initmap
   53  0.1% VM_EQ
   28  0.1% screen.init
   27  0.1% math.init
    1  0.0% string.new
    1  0.0% output.init
    1  0.0% memory.init
ponggame.newinstance -- 1.1 million instructions
 7619 69.5% screen.clearscreen
 1508 13.8% VM_LT
  691  6.3% memory.alloc
  228  2.1% output.drawchar
  150  1.4% screen.drawrectangle
  144  1.3% math.divide
  141  1.3% screen.updatelocation
  130  1.2% math.multiply
  119  1.1% VM_RETURN
  111  1.0% VM_CALL
   30  0.3% VM_GT
   13  0.1% output.printchar
   13  0.1% math.abs
   10  0.1% string.charat
    9  0.1% output.getmap
    8  0.1% string.appendchar
    7  0.1% output.printstring
    7  0.1% VM_EQ
    5  0.0% ponggame.new
    5  0.0% ball.setdestination
    2  0.0% output.movecursor
    2  0.0% bat.draw
    2  0.0% ball.new
    1  0.0% string.new
    1  0.0% bat.new
    1  0.0% ball.show
    1  0.0% ball.draw
ponggame.run -- 5 million instructions
16566 33.3% math.divide
 9808 19.7% math.multiply
 6111 12.3% screen.drawrectangle
 3469  7.0% VM_RETURN
 3238  6.5% VM_CALL
 3118  6.3% VM_GT
 2616  5.3% screen.updatelocation
 2400  4.8% VM_LT
  964  1.9% math.abs
  425  0.9% VM_EQ
  259  0.5% bat.move
  257  0.5% ball.move
  135  0.3% ball.draw
   97  0.2% ponggame.moveball
   79  0.2% ponggame.run
   60  0.1% screen.setcolor
   53  0.1% ball.show
   50  0.1% ball.hide
   24  0.0% memory.peek
   17  0.0% keyboard.keypressed
    6  0.0% ball.setdestination
    5  0.0% ball.bounce
    1  0.0% ball.getright
Although it isn't obvious from these views of the profiling, most of the OS initialization time is spent generating the display character bitmaps, allocating RAM for each character individually -- effectively just moving numbers from ROM to RAM because there is no way to read the bitmaps out of ROM in the Harvard architecture. 
--Mark