Jorge wrote
I figured it was much easier to implement Mux4Way16 if I designed a Mux chip that would select from 4 inputs. I called this one Mux4.
My initial implementation of Mux4Way16 was similar to yours but I eventually migrated to using Mux16s instead because the HDL is much easier to read with 3 parts than with 16.