Eugeniu wrote
1) Isn't it redundant that to compute a given result the ALU actually computes all possible variants and then selects the correct one? Is this how all ALUs work in the real world or is it just a consequence of the design of this specific ALU?
It hardware it's generally easier to compute multiple values and select which ones to use. The hardware is always there and always computing.
A related question would be how actually this ALU was designed?
I don't know how this particular ALU was designed. It's a very neat solution. Most often this sort of design starts with an "aha" moment of inspiration -- for instance realizing that ~(~
a plus
b) =
a-
b -- and then working out how the remaining logical operations can be fit into that structure.
If I had to design an ALU that had the Hack functionality I would have ended up with separate adder/subtractor and Boolean units resulting in about twice the hardware.
2) Again, given that all possible results are actually computed, does this mean the CPU frequency is limited by its longest operation and there is no way to perform the less complicated operations faster? I guess having different speeds for different operations would introduce the issue of when actually we would know that the CPU outputs are valid and can be stored in a register. I guess having to deal with a single clock frequency that accommodates all operations simplifies greatly your design. What do you think?
One of the engineer's jobs is what's called "worst case timing analysis" which is exactly that; finding the longest path through the various circuits making up the computer. One technique used when some operations are slower than others is to add "wait states" to the slower instructions. Basically the CPU holds the computer for the required number of clock cycles before writing the result to memory and incrementing the program counter.
By the way could the ALU be implemented fundamentally different than I did? (e.g. not using multiplexors at all)?
Sure. Consider building a 1-bit ALU that could be replicated N times depending on required word width. The processing for one bit of x input could work like this:
Not(in=zx,out=Nzx);
And(a=x, b=Nzx, out=x1);
Xor(a=x1, b=nx, out=x2);
where x2 is the input to the And and Adder. It's hard to do this using 16-bit parts in HDL because it's a lot of typing to connect a 1-bit signal to all 16 inputs of a 16-bit part. I suppose you could make special chips like And16x1 that would take 16-bit
a and 1-bit
b.
--Mark