ALU - performance implications

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

ALU - performance implications

Eugeniu
I've just successfully tested the ALU.

Hopefully this will not be considered a spoiler but in my ALU design the multiplexor gate is the main component. In connection to that aspect I have the following questions:

1) Isn't it redundant that to compute a given result the ALU actually computes all possible variants and then selects the correct one? Is this how all ALUs work in the real world or is it just a consequence of the design of this specific ALU? A related question would be how actually this ALU was designed? The book mentions the backwards reasoning, starting from the results and deducting the inputs but doesn't actually state how that was done so for me this is one level of abstraction that is still unclear. For example, if I would like to add comparison operations, multiplication, division, etc. how would I determine the primitive operations for my inputs (in this case there were negation, zeroing, addition and anding)?

2) Again, given that all possible results are actually computed, does this mean the CPU frequency is limited by its longest operation and there is no way to perform the less complicated operations faster? I guess having different speeds for different operations would introduce the issue of when actually we would know that the CPU outputs are valid and can be stored in a register. I guess having to deal with a single clock frequency that accommodates all operations simplifies greatly your design. What do you think?

By the way could the ALU be implemented fundamentally different than I did? (e.g. not using multiplexors at all)?
Reply | Threaded
Open this post in threaded view
|

Re: ALU - performance implications

cadet1620
Administrator
Eugeniu wrote
1) Isn't it redundant that to compute a given result the ALU actually computes all possible variants and then selects the correct one? Is this how all ALUs work in the real world or is it just a consequence of the design of this specific ALU?
It hardware it's generally easier to compute multiple values and select which ones to use. The hardware is always there and always computing.
A related question would be how actually this ALU was designed?
I don't know how this particular ALU was designed. It's a very neat solution. Most often this sort of design starts with an "aha" moment of inspiration -- for instance realizing that ~(~a plus b) = a-b -- and then working out how the remaining logical operations can be fit into that structure.

If I had to design an ALU that had the Hack functionality I would have ended up with separate adder/subtractor and Boolean units resulting in about twice the hardware.
2) Again, given that all possible results are actually computed, does this mean the CPU frequency is limited by its longest operation and there is no way to perform the less complicated operations faster? I guess having different speeds for different operations would introduce the issue of when actually we would know that the CPU outputs are valid and can be stored in a register. I guess having to deal with a single clock frequency that accommodates all operations simplifies greatly your design. What do you think?
One of the engineer's jobs is what's called "worst case timing analysis" which is exactly that; finding the longest path through the various circuits making up the computer. One technique used when some operations are slower than others is to add "wait states" to the slower instructions. Basically the CPU holds the computer for the required number of clock cycles before writing the result to memory and incrementing the program counter.
By the way could the ALU be implemented fundamentally different than I did? (e.g. not using multiplexors at all)?
Sure. Consider building a 1-bit ALU that could be replicated N times depending on required word width. The processing for one bit of x input could work like this:
    Not(in=zx,out=Nzx);
    And(a=x, b=Nzx, out=x1);
    Xor(a=x1, b=nx, out=x2);
where x2 is the input to the And and Adder. It's hard to do this using 16-bit parts in HDL because it's a lot of typing to connect a 1-bit signal to all 16 inputs of a 16-bit part. I suppose you could make special chips like And16x1 that would take 16-bit a and 1-bit b.

--Mark

Reply | Threaded
Open this post in threaded view
|

Re: ALU - performance implications

Eugeniu
The solution you provided is 2 times faster than using a multiplexor at the input preprocessing step ! Will update my design.
Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: ALU - performance implications

cadet1620
Administrator
For actual hardware, CMOS ICs can implement 4-way multiplexors with 2transistor delays from inputs to output, 3 transistor delays from selects to output. Not, Nand and Nor all require a single delay. And, Or and Xor require 2 transistor delays.

Not-And-Xor will therefore require 5 delays. A faster circuit would be
    Not(in=x, out=Nx);
    Mux4Way(sel[0]=nx, sel[1]=zx, a=x, b=Nx, c=false, d=true, out=x1);
which requires only 3 transistor delays.

Also note that N-way Nand and Nor are the same speed as 2-way.  So an equal speed circuit with fewer transistors would be
   

--Mark

Reply | Threaded
Open this post in threaded view
|

Re: ALU - performance implications

Eugeniu
Wait.. wait.. :) . The solution with Not and Mux4Way is the one I did in the first place, but in my calculations, admitting that NANDs delay equals 1, we have:
Mux4Way16 delay is 11 and your solution with Not-And-Xor is just 6.

As I understand you are talking about a hardware implementation which is different. This is very confusing because the performances are very different.. as I wanted to use the ideas from nand2tetris to build my own, simple PC.

Are you aware of any user group centered around that interest (building a physical PC)? There are several implementations on the internet but those are really one man's project. I am looking for some sort of community. Perhaps I could complement the knowledge from nand2tetris with some design considerations using real hardware.  
Reply | Threaded
Open this post in threaded view
|

Re: ALU - performance implications

cadet1620
Administrator
I'm not aware of any user groups or online courses aimed at building your own computer from scratch. Searching Google for "building your own computer" only turns up the idea of buying case, power supply, motherboard, etc. and bolting it all together. Yawn.

A good place to start learning about hardware is http://play-hookey.com/ Also http://ozark.hendrix.edu/~burch/logisim/ is a nice visual Logic Simulator that will let you experiment with asynchronous sequential logic. Build the flip-flops you learn about in play-hookey.

If you haven't, read the book "Code" by Petzold.

I just Googled "TTL computer" and got better results. Give it a try. TTL is transistor-transistor-logic, a type of IC's that were very popular in the 80's and are still available for hobbyist use. See http://en.wikipedia.org/wiki/7400_series

--Mark
Reply | Threaded
Open this post in threaded view
|

Re: ALU - performance implications

Eugeniu
To add to the discussion about the hardware design I am currently searching for some books to get me started. Haven't found one I would be 100% satisfied yet with but so far these look promising:

- Books with basics about the tools and some mini electronics projects to get started:
[1] http://www.amazon.com/Starting-Electronics-Construction-Techniques-Equipment/dp/0750667362
[2] http://www.amazon.com/PIC-Microcontrollers-Projects-Beginners-Experts/dp/090570570X
[3] http://www.amazon.com/Make-Electronics-Discovery-Charles-Platt/dp/0596153740/

- Books about computer design
[1] http://retro.hansotten.nl/uploads/z80/BuildYourOwnZ80.pdf - legally available for free :)
[2] http://www.amazon.com/Digital-Logic-Microprocessor-Design-VHDL/dp/0534465935
[3] http://www.amazon.com/Embedded-Computing-Approach-Architecture-Compilers/dp/1558607668
[4] http://www.amazon.com/Computer-Organization-Design-Fourth-Architecture/dp/0123747503

For the lucky people living in USA I've found this very cheap microcontroller to play with http://processors.wiki.ti.com/index.php/MSP430_LaunchPad_(MSP-EXP430G2). It costs less than 5$ !

Also found this amazing multi-year project of Bill Buzbee: http://homebrewcpu.com/. This guy already did it ! Unfortunately his website is really more a diary rather than a step-by-step tutorial for someone just starting with computer design.