I wrote a functional ALU (passes all tests) but I'm pretty certain I did it in the most inefficient way possible.
Basically, the ALU actually performs all the computations, but make use of a Mux16 chip in each step to decide if the next step should use the newly computed value or the value before that. It only took 15 lines of HDL and was surprisingly easy to implement (I had a harder time with some way simpler chips).
I will now try to make it more efficient, but I was curious to know if anyone else did something like this.
I'd recommend just doing it per the specification that is given -- that's a good road map.
It's bit hard to tell from your description, but it sounds like this might be what you are doing.
For instance, if it says that a certain signal either flips the bits or not, then use a MUX controlled by that signal and in one input bring in the data and in the other input bring in the Not of that data. Let the control bit choose which one goes forward.
Fifteen gates sounds about right, depending on how you produced the zr and ng signals.
As for efficiency, in hardware those gates are already there, so having them perform alternate computations and then picking the one you want is quite reasonable. There are better ways than using Muxes for several of those operations, but for the N2T purposes those kind of gate level optimizations are off the table because of starting with only Nand gates to build everything with.
If you wanted to go for maximum speed, you would have sixteen pairs of And16 and Add16 gates, each being fed with one possible hard-coded choice of zx, nx, zr, nr, giving you thirty-two outputs. You would double that by adding the negated version of each. That would give youi 64 possible outputs and you would use your 6 control inputs to control a 64:1 Mux to choose the one you wanted. These are the types of things that are actually done when you want something to run as fast as possible and are willing to buy the silicon real estate to do it.