Nand2Tetris Questions and Answers Forum › Hardware › Chapter 2

Carry Lookahead Implementation

Classic

List

Threaded

7 messages Options

Shjanzey

Carry Lookahead Implementation

I went ahead and implemented a carry lookahead 4-bit adder and I wanted to know if I am on the right track. I tried to include helpful comments, for example the boolean calculations for each carry bit calculation. The last calculation is quite verbose, and I find it hard to understand why this would be quicker than the ripple chip containing "less code".

First I created a chip called a partial full adder that will take an input of x, y, and a carry bit. It will output a sum bit, a generate bit, and a propagate bit.

CHIP PFA {
IN a, b, c;
OUT s, g, p;

PARTS:
//Put your code here.

//Calculate the Generate Bit
And(a=a, b=b, out=g);

//Calculate the Propagation Bit
Or(a=a, b=b, out=p);

//Calculate the Sum-Bit
Xor(a=a, b=c, out=s1);
Xor(a=s1, b=b, out=s);

}

Next I created the 4-bit look forward adder which i called "Add4" for simplicities sake. Maybe I should rename it to Add4CLA. After each partial full add, I went ahead and calculated the carry bit, including the boolean logic in a comment for your convenience. It is that step that truly baffles me. I would love some further explanation as to why, although this seems to involve more calculation, mathematically it should result in fewer gates and generation in a constant time (rather than linear as ripple is done).

CHIP Add4 {
IN a[4], b[4], c0;
OUT out[4], carry;

PARTS:
//Put your code here.

PFA(a=a[0], b=b[0], c=c0, s=out[0], g=g0, p=p0);

//C1 = g0+p0c0
And(a=p0, b=c0, out=pand1);
Or(a=pand1, b=g0, out=c1);

PFA(a=a[1], b=b[1], c=c1, s=out[1], g=g1, p=p1);

//C2 = g1 + p1g0 + p1p0c0
And(a=p1, b=pand1, out=pand2);
And(a=p1, b=g0, out=gand1);
Or(a=g1, b=pand2, out=or1);
Or(a=or1, b=pand2, out=c2);

PFA(a=a[2], b=b[2], c=c2, s=out[2], g=g2, p=p2);

//C3 = g2 + p2g1 + p2p1g0 + p2p1p0c0
And(a=p2, b=pand2, out=pand3);
And(a=p2, b=gand1, out=gand2);
And(a=p2, b=g1, out=ggand1);
Or(a=pand3, b=gand2, out=or2);
Or(a=or2, b=ggand1, out=or3);
Or(a=or3, b=g2, out=c3);

PFA(a=a[3], b=b[3], c=c3, s=out[3], g=g3, p=p3);

//c4 = g3 + p3g2 + p2p3g1 + p1p2p3g0 + p0p1p2p3c0
And(a=p3, b=pand3, out=pand4);
And(a=p3, b=gand2, out=gand3);
And(a=p3, b=ggand1, out=ggand2);
And(a=p3, b=g2, out=ppand1);
Or(a=g3, b=ppand1, out=or4);
Or(a=or4, b=ggand2, out=or5);
Or(a=or5, b=gand3, out=or6);
Or(a=or6, b=pand4, out=carry);

}

Finally I have my Add16 chip. Since it is difficult to string long chains of carry look ahead together, I am wondering if the speedup will be found not in the number of adders strung together, but rather it will be realized when thousands of iterations have been run over each adder type. (E.G. -- Ripple versus Carry Lookahead). Your thoughts are appreciated.

Note: I left my old implementation commented out in the Add16 for comparison's sake.

CHIP Add16 {
IN a[16], b[16];
OUT out[16];

PARTS:
//Second version with 4-bit CLA and ripple carry between 4bit adders;

Add4(a=a[0..3], b=b[0..3], c0=false, out=out[0..3], carry=c1);
Add4(a=a[4..7], b=b[4..7], c0=c1, out=out[4..7], carry=c2);
Add4(a=a[8..11], b=b[8..11], c0=c2, out=out[8..11], carry=c3);
Add4(a=a[12..15], b=b[12..15], c0=c3, out=out[12..15], carry=c4);

// Put your code here.
//First Version with ripple carry
/* HalfAdder(a=a[0], b=b[0], sum=out[0], carry=c1);
FullAdder(a=a[1], b=b[1], c=c1, sum=out[1], carry=c2);
FullAdder(a=a[2], b=b[2], c=c2, sum=out[2], carry=c3);
FullAdder(a=a[3], b=b[3], c=c3, sum=out[3], carry=c4);
FullAdder(a=a[4], b=b[4], c=c4, sum=out[4], carry=c5);
FullAdder(a=a[5], b=b[5], c=c5, sum=out[5], carry=c6);
FullAdder(a=a[6], b=b[6], c=c6, sum=out[6], carry=c7);
FullAdder(a=a[7], b=b[7], c=c7, sum=out[7], carry=c8);
FullAdder(a=a[8], b=b[8], c=c8, sum=out[8], carry=c9);
FullAdder(a=a[9], b=b[9], c=c9, sum=out[9], carry=c10);
FullAdder(a=a[10], b=b[10], c=c10, sum=out[10], carry=c11);
FullAdder(a=a[11], b=b[11], c=c11, sum=out[11], carry=c12);
FullAdder(a=a[12], b=b[12], c=c12, sum=out[12], carry=c13);
FullAdder(a=a[13], b=b[13], c=c13, sum=out[13], carry=c14);
FullAdder(a=a[14], b=b[14], c=c14, sum=out[14], carry=c15);
FullAdder(a=a[15], b=b[15], c=c15, sum=out[15], carry=c16);*/

}

cadet1620

Re: Carry Lookahead Implementation

Administrator

The speed improvement in the Carry Lookahead Adder comes into play as the adder gets wider.

For a Ripple Carry Adder, the worst case delay for an N-bit adder is from C_IN in to Sum_N through all the bit-to-bit carries. The delay time is k₁ N for some constant k₁.

For a CLA Adder, the worst case delay path gets from C_IN in to Sum_N by going up the tree of Lookahead Generators and back down to the Sum_N adder. The delay time is k₂ log₂N.

k₂ is typically larger than k₁ so there is a tradeoff point between the two. Often, the smallest building blocks of the CLA adder, usually 4 bits wide, are implemented with ripple carry add and parallel G, P computation.

--Mark

Shjanzey

Re: Carry Lookahead Implementation

Ok, this makes a lot more sense. Thank you for that clarification.

Oh by the way I just realized I can simplify the Add4 program above drastically. I worked some more on the algebra and then laughed at what I was doing. I actually recalculate previous carry pins when they are already available. With your reply and my corrected math I can now understand the elegance of this method. I'll post an update version.

CHIP Add4 {
IN a[4], b[4], c0;
OUT out[4], carry;

PARTS:
//Put your code here.

PFA(a=a[0], b=b[0], c=c0, s=out[0], g=g0, p=p0);

//C1 = g0+p0c0
And(a=p0, b=c0, out=pand1);
Or(a=pand1, b=g0, out=c1);

PFA(a=a[1], b=b[1], c=c1, s=out[1], g=g1, p=p1);

//C2 = g1 + p1g0 + p1p0c0 --> g1 + p1(g0+p0c0) --> g1 + p1c1

And(a=p1, b=c1, out=pand2);
Or(a=g1, b=pand2, out=c2);

PFA(a=a[2], b=b[2], c=c2, s=out[2], g=g2, p=p2);

//C3 = g2 + p2g1 + p2p1g0 + p2p1p0c0 --> g2+p2(g1+p1(g0+p0c0)) --> g2+p2c2

And(a=p2, b=c2, out=pand3);
Or(a=g2, b=pand3, out=c3);

PFA(a=a[3], b=b[3], c=c3, s=out[3], g=g3, p=p3);

//c4 = g3 + p3g2 + p2p3g1 + p1p2p3g0 + p0p1p2p3c0

And(a=p3, b=c3, out=pand4);
Or(a=g3, b=pand4, out=carry);

}

cadet1620

Re: Carry Lookahead Implementation

Administrator

In reply to this post by Shjanzey

If you are interested in other types of adder optimization, check out this post on Carry Select Adders

--Mark

Jaytron

Re: Carry Lookahead Implementation

Has anyone had any success implementing a full 16 bit carry lookahead adder? I've successfully implemented the 4-bit carry lookahead adder in .hdl code as visualized here: http://en.wikipedia.org/wiki/Lookahead_Carry_Unit but when I attempt to scale the chip up to a 16-bit lookahead carry unit the Nand2Tetris hardware simulator gives me the error that I have a "circle in my parts connection".

It's basically the group propagate and group generate outputs from four 4-bit carry lookahead chips piped into a 5th carry lookahead chip (therefore accepting 16 bits as input). I've narrowed down the error to the carry bits from the 5th carry lookahead chip, since they are fed in to the four upper level carry lookahead chips.

I'm at a bit of a loss since I don't believe that my logic diagrams nor implementation are wrong (I've built/rebuilt it in it's entirety twice).

Does any have any experience or suggestions for this?

Thanks in advance!

cadet1620

Re: Carry Lookahead Implementation

Administrator

Jaytron wrote

Has anyone had any success implementing a full 16 bit carry lookahead adder? I've successfully implemented the 4-bit carry lookahead adder in .hdl code as visualized here: http://en.wikipedia.org/wiki/Lookahead_Carry_Unit but when I attempt to scale the chip up to a 16-bit lookahead carry unit the Nand2Tetris hardware simulator gives me the error that I have a "circle in my parts connection".

I'm assuming that you are trying to implement the second figure on the Wiki page you referenced.

You are running into a limitation of the Hardware Simulator. It sees that there are outputs p4 and g4 from the 4-bit CLA adder that connect to the 16-bit CLA chip, and the C4 output from the 16-bit CLA that connects back to the same 4-bit CLA adder. It doesn't analyze the internals of the CLA adder so it doesn't know that the C4 input doesn't affect the p4 and g4 outputs.

A quick and dirty kludge is to use two CLA adders for each 4-bit group. One to generate the p and g and the other to add A, B and C. From my CLAadd16.hdl:

    ClaAdd4(a=a[0..3],   b=b[0..3],   c=c,  sum=out[0..3]);
    ClaAdd4(a=a[0..3],   b=b[0..3],                         g=g0, p=p0);

(There's a comment in my HDL explaining this kludge. The comment is longer than the implementation!)

If you want to explore more complicated logic I highly recommend Logisim, a fairly powerful schematic-based logic simulator.

--Mark

Jaytron

Re: Carry Lookahead Implementation

Thanks for the advice! I'm not sure why I didn't occur to me before, but I never tried to make a modified 1-Bit Full Adder to handle the propagate/generate bits internally and then plug them directly in to the ClaAdd4 chip.

As it stands right now, my 4-Bit CLA chip accepts only a C0 (carry-in bit) and two 4 bit numbers to add. It outputs the 4-Bit sum, the C4 bit to shift to the next group, and the GP and GG bits.

I just scaled it up to a 16-Bit chip and it works! Working on the test scripts for it now.

I really appreciate the help and I'm excited to check out Logisim.

-Jason