CHIP DMux8Way {
    IN in, sel[3];
    OUT a, b, c, d, e, f, g, h;
    PARTS:
    // This works
    // DMux(in=in, sel=sel[2], a=outA, b=outB);
    // DMux4Way(in=outA, sel=sel[0..1], a=a, b=b, c=c, d=d);
    // DMux4Way(in=outB, sel=sel[0..1], a=e, b=f, c=g, d=h);
    // This does not work: "an internal pin may only be fed once by a parts output in"
    DMux(in=in, sel=sel[2], a=outA, b=outB);
    DMux(in=outA, sel=sel[1], a=outA1, b=null);
    DMux(in=outB, sel=sel[1], a=null, b=outB1);
    DMux4Way(in=outA1, sel=sel[0], a=a, b=b, c=c, d=d);
    DMux4Way(in=outB1, sel=sel[0], a=e, b=f, c=g, d=h);
}
I am wondering why the former implementation works but the latter doesnt. I dont really understand how I did the first implementation, I understand what a 8 way demux is meant to do but I am not sure how the way I did it actually works. My first attempt was the latter implementation and I can understand why I was previously getting the error "Sel (1) and sel (2) have different bus widths" since I was using sel[2] once and sel[1] once, when actually sel[2] is double the width of sel[1] but now when I fixed that error, I get the internal pin error. It makes sense to my how this would work, both dmux's with sel[1] would feed the output pin into the 4way demux chips and then the 4way demux chips would output the final outputs and I know what the former implementation is more efficient since it uses less chips but I just dont fully understand how it works. 
I watched a video on implementing the chips and his way made sense to me. I watched it and waited a day to re-do the gate on my own, and did it from memory but I have this nagging feeling I just dont fully understand how it actually works.