You can definitely get pong to fit in 32k. You need to have a better compilation tool chain.
You can optimize a ton of stuff that the books vm to assembler generated, recognizing push and pops that are plain copies etc
Intelligently only including the bits of the runtimes that you need
See my hack project
https://github.com/pm100/hack