First, there is no "correct" approach. If your chip passes the test, it's good.
You can write more concise implementations by using higher-level abstractions. Think about how you can use 3 DMux chips to make a DMux4Way. You want the data path to look something like this:
in
|
*
/ \
/ \
/ \
* *
/ \ / \
| | | |
a b c d
The trick is what you do with the Muxs'
sel inputs to get the data where it needs to go.
--Mark