The second implementation works fine. Try restarting the hardware simulator and load the test script directly. The simulators and emulators sometimes are left in a bad state and won't run correctly if you rerun the scripts.
To understand why the second version is correct, you need to know that hardware bits are numbered from right to left. This is so that when a bus is carrying a binary number, bus[n] is the 2^n-weighted bit of the number.
[After your problem is resolved, please edit you post to remove the working code; we want students to discover their own solutions.]
You should develop the gates in the order described in the book to avoid such issues. The tests on Mux16 aren't exhaustive by themselves. They rely on the Mux tests passing, which wasn't the case for you.