|
|
Hi everyone,
I am curious about what the simplest and most elegant implementation of the ALU possible is, using only the chips previously defined in the text. Let's define "simplest" as using the fewest number of parts, i.e. the fewest lines of code in the PARTS section of the HDL file.
I know that it is possible to do using 13 parts (13 lines of code in the PARTS section). However, there is one piece of this minimal 13-part implementation that feels inelegant—as if a specific chip is missing from the set of chips previously defined in Chapter 1.
If anyone has found a way to implement the ALU in less than 13 lines with only the chips previously defined in the text, please share how many lines (parts) you used. I would be very curious to know how it's done.
|
Administrator
|
That depends entirely one what your metric is for "simple" and "elegant".
You are defining "simplest" as the fewest lines of code in the PARTS section of the HDL file.
Okay, which HDL file? Just the ALU.hdl? If your ALU.hdl had, say, 13 parts but those parts totaled out having 100 parts, is that really simpler than an ALU.hdl file that had 15 parts but those totaled out having only 90?
FWIW, the Hack hardware really doesn't lend itself to these kinds of efforts. In CMOS, a Nand gate uses four transistors. So consider an And and an Or gate. The And gate is two Nand gates (8 transistors) while the Or gate is likely implemented by most people using one And gate and three Not gates, which total out to five Nand gates (20 transistors). But each is just one "part" in an HDL file. In reality, both and And and an Or are made (in CMOS) using 6 transistors.
I'm pretty sure I know which part you think is missing and I suspect the reason that it is left out is that the authors wanted one portion of the logic to require some thought as to how to identify a need for a function that isn't explicitly specified and then figure out how to implement it.
But there are other "missing" parts that would make the ALU quite a bit smaller and faster overall.
|
|
I know this is outside the scope of nand2tetris, but you can easily implement AND with 2 transistors, and not with 1 (making nand 3 transistors since it is not(and)).
Is there a reason why nand would use 4 transistors instead of 3?
I've realised, by looking it up, that you can also implement nand itself with 2 transistors, but I suspect this circuit diagram to be incorrect as its using the same transistors in the same direction:
https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fi.stack.imgur.com%2F2GSIm.png&f=1&nofb=1I'm making a guess the resistor placement is doing something I'm not accounting for as it's been a long time since I've had the stuff to mess with my breadboard (most of my electronics prototyping stuff was stolen by a roommate)
|
Administrator
|
SirGouki wrote
I know this is outside the scope of nand2tetris, but you can easily implement AND with 2 transistors, and not with 1 (making nand 3 transistors since it is not(and)).
Is there a reason why nand would use 4 transistors instead of 3?
I've realised, by looking it up, that you can also implement nand itself with 2 transistors, but I suspect this circuit diagram to be incorrect as its using the same transistors in the same direction:
https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fi.stack.imgur.com%2F2GSIm.png&f=1&nofb=1I'm making a guess the resistor placement is doing something I'm not accounting for as it's been a long time since I've had the stuff to mess with my breadboard (most of my electronics prototyping stuff was stolen by a roommate)
The circuits you show are what are known as RTL -- Resistor Transistor Logic. They rely one the circuitry actively pulling the output in one direction, but passively pulling it the other. Furthermore, the input characteristic are very different depending on if you are driving the input HI or driving it LO. This results in highly asymmetric input and output characteristics of each gate; furthermore, the characteristics are depending on the specific type of gate -- a NAND gate has very different characteristics that a NOR gate, for instance. This results in many issues that prevent it from being scalable to large circuits.
The most common form of digital circuitry today is CMOS -- Complementary Metal Oxide Semiconductor. This approach uses complementary pull-up and pull-down structures to create symmetric input and output characteristics that actively assert either a HI or a LO output and that are largely independent of the specific type of gate. This makes designing large digital circuits very scalable.
The common basic CMOS logic structures are shown in this article:
https://learn.digilentinc.com/Documents/313Look particularly at Figure 4. Notice that in each circuit, each input is driving one PFET gate and one NFET gate. If all of the circuits use the same size NFETs and PFETs, then the input characteristics of every gate type will be identical. For many logic libraries this is done, in part for this reason. But for more critical situations it becomes important get the output characteristics to be the same across different gate types and that requires playing with the transistor gate sizes, which will make the input characteristics logic gate-dependent. But that can be handled by adding dummy gates to equalize the total capacitance seen by any input to be a constant.
Another advantage of CMOS is the extremely huge fanout that is possible. If speed isn't an issue, you can literally use a single CMOS logic gate output to drive millions of other logic gate inputs (I've done it on numerous chip designs).
|
|
WBahn wrote
That depends entirely one what your metric is for "simple" and "elegant".
You are defining "simplest" as the fewest lines of code in the PARTS section of the HDL file.
Okay, which HDL file? Just the ALU.hdl? If your ALU.hdl had, say, 13 parts but those parts totaled out having 100 parts, is that really simpler than an ALU.hdl file that had 15 parts but those totaled out having only 90?
FWIW, the Hack hardware really doesn't lend itself to these kinds of efforts. In CMOS, a Nand gate uses four transistors. So consider an And and an Or gate. The And gate is two Nand gates (8 transistors) while the Or gate is likely implemented by most people using one And gate and three Not gates, which total out to five Nand gates (20 transistors). But each is just one "part" in an HDL file. In reality, both and And and an Or are made (in CMOS) using 6 transistors.
I'm pretty sure I know which part you think is missing and I suspect the reason that it is left out is that the authors wanted one portion of the logic to require some thought as to how to identify a need for a function that isn't explicitly specified and then figure out how to implement it.
But there are other "missing" parts that would make the ALU quite a bit smaller and faster overall.
That's true, I am defining "simple" as using the fewest number of previously constructed "chips" (fewest lines of code in the PARTS section of the HDL file).
However I feel like this is a natural definition based on the course's emphasis on the abstraction-implementation paradigm. To me the simplest and most elegant construction is the one which relies on the fewest number of previously constructed sub-modules (chips in this case), so that you can simply concern yourself with how to connect the sub-modules together without worrying about how they work or are implemented internally. To me, that idea is one of the fundamental themes of the course.
Obviously you could build an implementation that is "simpler" or more "elegant" in terms of gate/transistor count, time delay, area, power, etc. if you were willing to flatten the hierarchy and design the ALU at the transistor level, but in my mind, that increases the complexity of the design (to the benefit of other metrics like size/speed) because it would no longer be composed of a small number of easy-to-understand parts.
|
Administrator
|
Consider two cases: You have the CPU as you have currently designed it (call it CPU_A) and you have another version (CPU_B) in which your decode logic is poorly design and uses an additional ten logic parts. Which CPU is "simpler", the A version of the B version?
Now, you take the B version and package up all of the decode logic into a single part and then instantiate that one decode part in a new design called CPU_C instead of the all of the individual logic parts. Which is simpler now, CPU_A or CPU_C?
Based on your metric, you would have to say that CPU_C is simpler than CPU_A, which is in turn simpler than CPU_B.
However, if I were to manufacturer all three, CPU_B and CPU_C would be indistinguishable.
As an aside, there's no need to flatten the hierarchy to get the improvement in other metrics -- they are by no means mutually exclusive.
Be that as it may, what you are saying about how we partition the design to control the hierarchy has some merit. While it is reasonable to say that having fewer parts TENDS to make the logic at that level of abstraction clearer, this is not always the case and it's a dangerous trap to get too wedded to that notion. What is far more valuable is to have the parts at a given level of abstraction match a partitioning of the project into smaller portions that form a coherent picture at that level of abstraction. Sometimes combining two things can make it harder to understand how things work because the interaction is valuable at the current level of abstraction.
Consider the top level computer. There are two memories, a ROM and a RAM. We could very easily combined these into a single part named MEMORY connected to a part named CPU with six lines going back and forth between them (plus a separate reset signal going into the CPU). But does this really aid in an understanding of the design at this top level of abstraction? Or is a better understanding of the design, at this level, to be had by seeing one part that handles the flow and interaction of the CPU with the instructions and a separate part that handles the flow and interaction of the CPU with the data?
|
|