The focus of the nand2tetris VM language is to provide a simple intermediate representation of programs that provides enough information to easily implement a backend/VM Translator for the Hack computer. By doing so, the language design eliminates away a lot of context that would be helpful for optimization. If you want to stay compatible with this, your approach of adding annotations in comments to the code is perfectly fine. Other optimization approaches that I have seen in this forum, worked with a second layer of IR, thus translating the Hack VM language into another richer VM language, or to use another IR right away.
The missing context can be recovered from the VM code, but requires extra work in some cases with a static analysis of the code. For example: it is possible to write a decompiler, that translates VM code back into a Jack source code file.
When I started to optimize my compiler/VM translator I added annotations in the same way as you did. Regarding the pop temp 0 optimization, the approach to do a DEC SP is perfectly fine and safe. You can even go one step beyond and omit the command completely. I have briefly written about this in my first post
here under 3. Improved VM codeThat requires to have 2 separate return routines for void and non-void functions where only the latter would leave a return value on the stack. The former doesn't do that and therefore no pop temp 0 or decrease of stack is needed anymore. This could be implemented with annotations as well (i.e. RETURN 0 //#VOID). A pitfall here is that a non-void function can be called with the DO statement as well, just ignoring the return value. In such a case you still need to decrease the stack pointer (so you need to keep track of the function return type as well, which IR usually do).
I've also briefly outlined there, that the second case of using temp 0 for arrays is unnecessary as well if the code for the index of the assignment is generated after the term. This is one of the cases where the missing context is not easily detected in the VM code. In the compiler this can be done by either using an AST or if you stick to the syntax-driven way, by shifting VM code/delay code emission.
Example:
push constant 2048
push static 0
add
push constant 14334
pop temp 0
pop pointer 1
push temp 0
pop that 0
could also be represented as:
// emit code for the term first; in case pointer 1 is written here, it doesn't matter
// because the address calculation of the assignment follows afterwards
push constant 14334
// leave the term on the stack and now take care of the assignment
push constant 2048
push static 0
add
pop pointer 1
pop that 0 // write term from above
Another optimization that I did, is to embed the information of constant indexes directly into the pop that:
push constant 14334
push static 0
pop pointer 1
pop that 2048
// now it is explicitly clear to the backend, that the 2048 serves as index
// before that information was only implicitly embedded in the add command
The next thing that can be done, is to avoid writing the pointer 1 twice, if the same array variable is accessed:
// write to static0[2048]
push constant 14334
push static 0
pop pointer 1
pop that 2048
// next statement writes to static0[2049]
push constant 2050
// push static 0 // unnecessary in this context
// pop pointer 1 // unnecessary in this context
pop that 2049
However, proper analysis needs to be done before such optimizations are applied.
I.e.
let a[1]=5;
let a=a+b;
let a[2]=6; // in this case pointer 1 needs to be set!