Try running your program in the CPU Emulator with Animation set for Program Flow and the speed set to Fast.
Does that behavior look like what you want (in terms of the program flow -- don't worry about what does and does not appear on the Screen).
Now reset the program and single step through it. Does that program flow seem reasonable?
Remember, this program should run indefinitely. If you start it running and then wait five minutes you should have a completely white screen. If you then hold down a key and wait five minutes, you should have a completely black screen. If you then release the key and wait five minutes, you should have a completely white screen again. You should be able to repeat this over and over as long as you want.
Your code has a lot of hard coded jump targets. This is very bad style. Why do you want to jump to the 29th line? A comment telling the reader that you are going to the 29th line is completely useless -- it's trivial so see that the two instructions
@29
D;JEQ
will go to the 29th line if D is equal to 0.
The comment should indicate WHY you are performing the jump.
Now consider what happens if you need to put a couple of instructions at the beginning to initialize some variable that you add, or you remove an instruction from near the beginning that turns out isn't needed. Would still want to go to the 29th line? No -- you want to go to the 29th line (which is actually the 30th line since the first line is line 0) because that is were the beginning of some code that does something is located. So put a meaningful label just before that line of code and then load that label into the A register instead of a hard-coded number. That not only makes the code much, much more readable, but it also let's the assembler keep track of what the correct line number is as you modify your code.
To underscore this point, what line to you THINK this code jumps to when it jumps to 29? I'll wager that it's not the line you wanted it to jump to.
It looks like you also need to understand the difference between labels and variables.
The ONLY label you have in your entire program is END. Everything else you have is a variable. Labels are defined using the parens.
(END)
@END
0;JMP // infinite loop
The assembler will make one pass through the code and keep track of the address that each instruction will end up at. In the case of your code, it will get down to the (END) line and it knows that the next instruction will end up at address 55 in the program ROM. So it associates the symbol "END" with the value 55.
On the second pass, when the instructions are actually generated, it replaces each instance of "END" with the number 55, so this code snippet becomes
@55
0;JMP
The first pass only identifies which symbols are labels and associates a number with each of them. On the second pass, when a symbol is encountered, the assembler looks in the list it produced during the first pass to see if that symbol is there. If it is, then it replaces it with the associated value. If it isn't, the assembler assumes it is a variable and adds that symbol to the list and associates it with a memory location in RAM. The first variable it encounters it puts at RAM address 16 and subsequent variables get put at successively higher RAM addresses.
So the first two lines of your code:
@LOOP
@R24576
are seen as just two variables (since neither has an associated label) and in the rest of the code "LOOP" will be replaced with the number 16 and "R24576" will be replaced with the number 17. So your first few lines of code:
@LOOP
@R24576
D=M // D=M[24576]
@29
D;JEQ // if D == 0, goto 29th line
@6
0;JNE // if D != 0, goto 6th line
become
@16
@17
D=M
@29
D;JEQ
@6
0;JNE
So, your program is not going to look at the keyboard buffer, but rather is going to look at the value stored in M[17] and make it's decision based on that.
You also have predefined symbols, such as SCREEN and KBD are already associated with 16384 and 24576 accordingly, so when you want to refer to the beginning of the screen memory or to the keyboard buffer, use these symbols. It will make your code much more readable.
Deal with these issues and see how far you can get.
As for whether it's possible to do the program without a nested loop, not really. But your approach will end up with three loops nested together, while it can be done in two. Remember that while we think of the screen as consisting of rows and columns, the entire screen memory is simply a block of RAM addresses starting at SCREEN.
Keep in mind that efficiency isn't the goal for this project -- any program that meets the specifications is acceptable. Having said that, it is educational to see how few instructions you can get the code down to. It can be done with no more than 19 instructions (I can't rule out that it can be done with fewer).