For a lunch time back-of-the-envelope doodle, this is quite a clever design.
Implementing this architecture in LogiSim has been on my to do list for quite a while and I finally got around 
to it.


Note: type in the keyboard while the program is running to change the message.
Along the way I discovered some interesting details.
Easily extended Jump instruction
As I was wiring up the PC load address for the Jump instruction (Branch with 
jjj = 0) I had a rather snaky 
wire running its way to R0.  It went past the ALU on it way and I got to thinking...
The Jump instruction, 11 000 xxxxxxxxxxx, has all the correct bits available in the undefined region
to turn it into a PC= instruction with full ALU computation including indirect src1.
    11 000 iss oooooo ss
This may be quite useful for return IPs for assembly language subroutines, computed jumps and implementation 
of call and return stack functions.
It appears useful if the PC= instruction does not set the status bits; there may be a way to use the
zr flag as a subroutine pass/fail return value.
        R0 = RIP_1
        R3 = R0
        BR      Function
    RIP_1:
        BNE     Error
        ...
    Function:
        ....
        R0 = 0  // good result
        PC = R3
Status bits and overloaded R0=n instruction
There is a problem with the above code snippet. Depending on how the assembler encodes the 
R0=0 instruction, it may be an A-instruction that does not set the 
zr flag.  
I changed the data routing for the A-instruction—instead of muxing the data directly into R0, I am muxing it 
into the ALU 
x input, setting the ALU function to 
out=
x, and loading the status register.
More macros
If R0 is considered a scratch register that the assembler my use at will, then additional useful macros are possible.
For example:
    Rd=-6               R0=6; Rd=-R0
    *Rd=123,            R0=123; *Rd=R0
    Rd=Rs+16, s!=0      R0=16; Rd=Rs+R0
How much do the extra registers help?
Here is an assembly version of C's strcpy.  The parameters are pointers to the string data.
On Hack, the parameters and RIP are passed in memory varaibles; 
on Mack they are passed in registers.
 |       Hack |  | 
      Mack | 
 | 
// call strcpy (dest, src)
	@src
	D=M
	@strcpy_src
	M=D
	@dest
	D=M
	@strcpy_dest
	M=D
	@rip_123
	D=A
	@strcpy_rip
	M=D
	@strcpy
	0;JMP
	...
	
strcpy:	@strcpy_src
	M=M-1
	@strcpy_dest
	M=M-1
	
loop:	@strcpy_src
	AM=M+1
	D=M
	@p2
	AM=M+1
	M=D
	@loop
	D;JNE
	
	@strcpy_rip
	A=M
	0;JMP
  |  | 
// call strcpy (dest, src)
	r0=src
	r1=*r0
	r0=dest
	r2=*r0
	r0=rip_123
	r3=r0
	r0=strcpy  // jump strcpy
	PC=r0      // macro
rip123:	...
	
	
	
	
	
	
	
strcpy:	r1=r1-1
	r2=r2-1
	
	
	
loop:	r1=r1+1
	r2=r2+1
	r0=*r1
	*r2=r0
	bne loop
	
	
	
	
	PC=r3
	
  | 
The Hack call + function runs in 23 + 8
n instructions, the Mack call + function in 11 + 5
n, where
n is the string length including the terminating 0.
 
--Mark