Ideas for a Slightly More Powerful CPU

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view

Ideas for a Slightly More Powerful CPU

Warren Toomey
I sat down during lunch today and doodled up a design for a replacement CPU for the Hack machine with the same API, but with a few more registers in it, and which has a more traditional machine language. Here's the design, hopefully my flat text document will come out. Comments on the design are most welcome!

            Mack: Warren's Idea for a Slightly More Powerful CPU

Goal: to build a different CPU, using the existing builtin Tecs chips with some extra control logic. Keep the same ALU and the same branch logic, but have multiple registers, and some CPU state flags, as per more popular CPUs.

Basic architecture: ALU; four 16-bit registers: R0, R1, R2, R3; two flag bits (negative and zero) which are set by the last ALU operation. Each instruction is 16 bits. There is up to one RAM fetch and one RAM store in each instruction.

Instruction formats: there are 3 instruction formats: load R0 with a constant, perform an ALU operation, and branch or jump. Here they are:

Load R0 instruction

  0 <15-bit unsigned constant>          Load R0 <- constant

ALU operation instruction

  10 idd iss oooooo ss                  meaning dst = src1 op src2

        dst = src1 op src2              Same 6-bits to control existing ALU,
                                                and 2 bits to identify the dst, src1
                                                and src2 registers.

                                        When i bit is set in iss, we load
                                        from RAM[src1], not src1. When i bit
                                        is set in idd, we save result to
                                        RAM[dst], not the dst register.

                                        Result of the ALU operation clears or
                                        sets the two flag bits: zero, negative.

Branch/Jump instruction
This instruction can encode a relative branch of up to 2048 instructions, or it can encode an absolute jump to the value store in R0.

  11 jjj <11-bit signed constant>       When any of the j bits are set,
                                                      they act like the Hack jump bits.
                                                      Examine the zero and negative flags.
                                                     If a branch, sign extend the constant
                                                     to 16 bits, and add it to the PC.

                                        However, when all the j bits are off,
                                        this is an unconditional jump. Load
                                        the contents of R0 into the PC.

Assembly Syntax

label:                                identifier followed by colon
  R0 = constant                 Load R0 with constant instruction

  [*]Rn = [[*]Rn] op Rn         ALU instruction, * means indirect addressing,
                                          [ ] means that this is optional

  Bxx label                     Relative branch to label, +/- 2048 instructions
                                    Bxx is BEQ, BNE, BLT, BLE, BGT, BGE, or BR
                                    for an unconditional branch.

  JMP label                     This is an absolute jump "macro" instruction.
                                    The assembler translates this into: load R0
                                    with the label, followed by a jump instruction.

Also another macro instruction:

        Rn = constant           If Rn != R0, do R0 = constant, Rn = R0.

Thus, R0 becomes a sort of "scratch" register.

Sum the Numbers from 1 to 100

  R1 = 1                // i = 1
  R2 = 0                // sum = 0
  R2 = R2 + R1          // sum = sum + i
  R1 = R1 + 1           // i++
  R0 = 100
  R0 = R1 - R0         // calculate i - 100
  BLE loop                // loop while i <= 100
  JMP end                 // end with infinite loop
Reply | Threaded
Open this post in threaded view

Re: Ideas for a Slightly More Powerful CPU

Warren Toomey
I found a bug. The general Hack computer architecture has only one address bus from CPU to memory, used to both identify the location of the data to fetch from RAM and write back to RAM. Therefore, the CPU cannot read from one address and write back to a different address in an instruction.

This means my CPU cannot do, for example, *R3 = *R2 + 1, i.e. load from the address which R2 points to, and write back to the address in R3.

So, the solution is, when the i bit is set in both idd and iss, the src1 value (i.e the ss in iss) is replaced with the destination value. This allow the CPU to do *R3 = *R3 + 1.
Reply | Threaded
Open this post in threaded view

Re: Ideas for a Slightly More Powerful CPU

For a lunch time back-of-the-envelope doodle, this is quite a clever design.

Implementing this architecture in LogiSim has been on my to do list for quite a while and I finally got around to it.

Mack computer in LogiSim

Mack computer CPU in LogiSim

Note: type in the keyboard while the program is running to change the message.

Mack.circLogiSim circuit
MackHello.lsbin  Hello world program (if you need to reload the ROM)
MackHello.lstCommented source

Along the way I discovered some interesting details.

Easily extended Jump instruction

As I was wiring up the PC load address for the Jump instruction (Branch with jjj = 0) I had a rather snaky wire running its way to R0. It went past the ALU on it way and I got to thinking...

The Jump instruction, 11 000 xxxxxxxxxxx, has all the correct bits available in the undefined region to turn it into a PC= instruction with full ALU computation including indirect src1.
    11 000 iss oooooo ss

This may be quite useful for return IPs for assembly language subroutines, computed jumps and implementation of call and return stack functions. It appears useful if the PC= instruction does not set the status bits; there may be a way to use the zr flag as a subroutine pass/fail return value.

        R0 = RIP_1
        R3 = R0
        BR      Function
        BNE     Error

        R0 = 0  // good result
        PC = R3

Status bits and overloaded R0=n instruction

There is a problem with the above code snippet. Depending on how the assembler encodes the R0=0 instruction, it may be an A-instruction that does not set the zr flag. I changed the data routing for the A-instruction—instead of muxing the data directly into R0, I am muxing it into the ALU x input, setting the ALU function to out=x, and loading the status register.

More macros

If R0 is considered a scratch register that the assembler my use at will, then additional useful macros are possible. For example:
    Rd=-6               R0=6; Rd=-R0
    *Rd=123,            R0=123; *Rd=R0
    Rd=Rs+16, s!=0      R0=16; Rd=Rs+R0

How much do the extra registers help?

Here is an assembly version of C's strcpy. The parameters are pointers to the string data.
On Hack, the parameters and RIP are passed in memory varaibles; on Mack they are passed in registers.
      Hack       Mack
// call strcpy (dest, src)
strcpy:	@strcpy_src
loop:	@strcpy_src
// call strcpy (dest, src)
	r0=strcpy  // jump strcpy
	PC=r0      // macro
rip123:	...
strcpy:	r1=r1-1
loop:	r1=r1+1
	bne loop

The Hack call + function runs in 23 + 8n instructions, the Mack call + function in 11 + 5n, where n is the string length including the terminating 0.



Reply | Threaded
Open this post in threaded view

Re: Ideas for a Slightly More Powerful CPU

This post was updated on .
In reply to this post by Warren Toomey
How does the relative branch work? For example in the code snippet shared:

// Sum the Numbers from 1 to 100
// ------------------------------------

0:    R1 = 1                    // i = 1
1:    R2 = 0                    // sum = 0
2:    R2 = R2 + R1          // sum = sum + i
3:    R1 = R1 + 1            // i++
4:    R0 = 100
5:    R0 = R1 - R0           // calculate i - 100
6:    BLE loop                  // loop while i <= 100
7:    JMP end                  // end with infinite loop

What would be the 11bit constant used for the BLE?

loop is at address 2. The BLE instruction is at address 6. Given that for a branch we "sign extend the constant to 16 bits, and add it to the PC", we would have to add -4 to the PC to get to loop from the BLE instruction. -4 is 65532 which is more than can be represented with 11 bits...


-1 is bin(65535) is 0b1111111111111111
-2 is bin(65534) is 0b1111111111111110
-3 is bin(65533) is 0b1111111111111101
-4 is bin(65532) is 0b1111111111111100
-5 is bin(65531) is 0b1111111111111011

Seems to be a pattern that is somehow useful...?
Reply | Threaded
Open this post in threaded view

Re: Ideas for a Slightly More Powerful CPU

Also, how would you conditionally branch to a location that is more than 2^11 addresses away from the branch instruction?
Reply | Threaded
Open this post in threaded view

Re: Ideas for a Slightly More Powerful CPU

Sign extension is how shorter 2's complement numbers are converted to longer numbers. Take the sign bit from the shorter number and use it for all the high order bits of the longer number.
      sxx xxxx xxxx     11-bit 2's complement
ssss ssxx xxxx xxxx     16-bit 2's complement

Conditional jumps to locations more than +/- 1023 away are quite rare because they are primarily only used in conditional and looping control structures. The dependent code in ifs and loops should be relatively small.

The only way to code an absolute conditional jump would be to use a relative "skip" around a jump.
    BNE skip        // BEQ far_away
    JMP far_away
The assembler would need to flag any conditional jumps that are too far away as errors and the programmer would need to fix them. The assembler can not do this automatically because the absolute jump uses R0, and it cannot know if R0 is safe to use.

Reply | Threaded
Open this post in threaded view

Re: Ideas for a Slightly More Powerful CPU

Ah ok, thank you!