
I went ahead and implemented a carry lookahead 4bit adder and I wanted to know if I am on the right track. I tried to include helpful comments, for example the boolean calculations for each carry bit calculation. The last calculation is quite verbose, and I find it hard to understand why this would be quicker than the ripple chip containing "less code".
First I created a chip called a partial full adder that will take an input of x, y, and a carry bit. It will output a sum bit, a generate bit, and a propagate bit.
CHIP PFA {
IN a, b, c;
OUT s, g, p;
PARTS:
//Put your code here.
//Calculate the Generate Bit
And(a=a, b=b, out=g);
//Calculate the Propagation Bit
Or(a=a, b=b, out=p);
//Calculate the SumBit
Xor(a=a, b=c, out=s1);
Xor(a=s1, b=b, out=s);
}
Next I created the 4bit look forward adder which i called "Add4" for simplicities sake. Maybe I should rename it to Add4CLA. After each partial full add, I went ahead and calculated the carry bit, including the boolean logic in a comment for your convenience. It is that step that truly baffles me. I would love some further explanation as to why, although this seems to involve more calculation, mathematically it should result in fewer gates and generation in a constant time (rather than linear as ripple is done).
CHIP Add4 {
IN a[4], b[4], c0;
OUT out[4], carry;
PARTS:
//Put your code here.
PFA(a=a[0], b=b[0], c=c0, s=out[0], g=g0, p=p0);
//C1 = g0+p0c0
And(a=p0, b=c0, out=pand1);
Or(a=pand1, b=g0, out=c1);
PFA(a=a[1], b=b[1], c=c1, s=out[1], g=g1, p=p1);
//C2 = g1 + p1g0 + p1p0c0
And(a=p1, b=pand1, out=pand2);
And(a=p1, b=g0, out=gand1);
Or(a=g1, b=pand2, out=or1);
Or(a=or1, b=pand2, out=c2);
PFA(a=a[2], b=b[2], c=c2, s=out[2], g=g2, p=p2);
//C3 = g2 + p2g1 + p2p1g0 + p2p1p0c0
And(a=p2, b=pand2, out=pand3);
And(a=p2, b=gand1, out=gand2);
And(a=p2, b=g1, out=ggand1);
Or(a=pand3, b=gand2, out=or2);
Or(a=or2, b=ggand1, out=or3);
Or(a=or3, b=g2, out=c3);
PFA(a=a[3], b=b[3], c=c3, s=out[3], g=g3, p=p3);
//c4 = g3 + p3g2 + p2p3g1 + p1p2p3g0 + p0p1p2p3c0
And(a=p3, b=pand3, out=pand4);
And(a=p3, b=gand2, out=gand3);
And(a=p3, b=ggand1, out=ggand2);
And(a=p3, b=g2, out=ppand1);
Or(a=g3, b=ppand1, out=or4);
Or(a=or4, b=ggand2, out=or5);
Or(a=or5, b=gand3, out=or6);
Or(a=or6, b=pand4, out=carry);
}
Finally I have my Add16 chip. Since it is difficult to string long chains of carry look ahead together, I am wondering if the speedup will be found not in the number of adders strung together, but rather it will be realized when thousands of iterations have been run over each adder type. (E.G.  Ripple versus Carry Lookahead). Your thoughts are appreciated.
Note: I left my old implementation commented out in the Add16 for comparison's sake.
CHIP Add16 {
IN a[16], b[16];
OUT out[16];
PARTS:
//Second version with 4bit CLA and ripple carry between 4bit adders;
Add4(a=a[0..3], b=b[0..3], c0=false, out=out[0..3], carry=c1);
Add4(a=a[4..7], b=b[4..7], c0=c1, out=out[4..7], carry=c2);
Add4(a=a[8..11], b=b[8..11], c0=c2, out=out[8..11], carry=c3);
Add4(a=a[12..15], b=b[12..15], c0=c3, out=out[12..15], carry=c4);
// Put your code here.
//First Version with ripple carry
/* HalfAdder(a=a[0], b=b[0], sum=out[0], carry=c1);
FullAdder(a=a[1], b=b[1], c=c1, sum=out[1], carry=c2);
FullAdder(a=a[2], b=b[2], c=c2, sum=out[2], carry=c3);
FullAdder(a=a[3], b=b[3], c=c3, sum=out[3], carry=c4);
FullAdder(a=a[4], b=b[4], c=c4, sum=out[4], carry=c5);
FullAdder(a=a[5], b=b[5], c=c5, sum=out[5], carry=c6);
FullAdder(a=a[6], b=b[6], c=c6, sum=out[6], carry=c7);
FullAdder(a=a[7], b=b[7], c=c7, sum=out[7], carry=c8);
FullAdder(a=a[8], b=b[8], c=c8, sum=out[8], carry=c9);
FullAdder(a=a[9], b=b[9], c=c9, sum=out[9], carry=c10);
FullAdder(a=a[10], b=b[10], c=c10, sum=out[10], carry=c11);
FullAdder(a=a[11], b=b[11], c=c11, sum=out[11], carry=c12);
FullAdder(a=a[12], b=b[12], c=c12, sum=out[12], carry=c13);
FullAdder(a=a[13], b=b[13], c=c13, sum=out[13], carry=c14);
FullAdder(a=a[14], b=b[14], c=c14, sum=out[14], carry=c15);
FullAdder(a=a[15], b=b[15], c=c15, sum=out[15], carry=c16);*/
}
