2s complement

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

2s complement

HighSchoolerWhoAsksHowTooMuch
Hi,

I was trying some stuff out with the provided Jack Compiler and used the number -32768. This gave me an "integer constant too big" error. My understanding was that two's complement is asymmetric and that it can (with 16 bits) represent integers from -32768 to 32767. Why am I getting this error?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: 2s complement

WBahn
Administrator
So here is where it pays off to understand the underlying tool chain -- and this is what you will REALLY benefit from in this course. But it's a learned skill that takes practice. What you want to be able to do is put yourself in the tool chain's position and walk it through. The more you know about the tool chain, the easier this is to do, but the more you understand the concepts in general, the better you will be able to apply similar reasoning even to languages and tool chains you are unfamiliar with and even to problems that have nothing to do with computers. You are learning to think under the hood of the problem.

So let's walk through the Jack2Hack tool chain.

What happens when you write a Jack program and use -32768 as a constant?

Well... first off, integer constants can't be negative. Look at the language specification in Figure 9.5 (of 1st Ed). "Integer constants must be positive and in standard decimal notation, e.g. 1984. Negative integers like -13 are not constants, but rather expressions consisting of a unary minus operator applied to an integer constant."

So the ACTUAL integer constant you are trying to use is 32768, which is outside the range of a 16-bit 2's complement integer.

So there's ONE answer to your question: The language spec doesn't allow it. But why doesn't it allow it? If -32768 is a representable 16-bit 2's complement value, shouldn't our language support it?

For that, we need to delve deeper down the tool chain.

The Jack compiler is going to convert references to constants into pops from the 'constant' memory segment. Figure 7.6 defines these segments, and we see that this segment holds all of the constant from 0 through 32767. So, again, we could stop and say that we can't because the VM language spec doesn't allow it. But why not?

How is a pop from the constant memory segment translated to assembly language?

It will use an A-type instruction, which in this case would be @32768. However, the specification for the A-type instruction given in Section 4.2.2 says that the value is either a non-negative 15-bit decimal number or a symbol referring to such a number. So yet another point at which we can say that it's because the spec doesn't allow it. But why not?

How is an A-type instruction assembled into Hack machine code? As a 0 followed by the fifteen bits that represent the number in binary.

Here is where we really get to the underlying issue. The first bit of a 16-bit Hack instruction tells the hardware what kind of instruction it is -- a 0 for an A-type and a 1 for a C-type. So only fifteen bits are available to represent the actual number.

This is also why the A-type instruction has to use non-negative numbers, because negative numbers (in 2's complement) has the most-significant bit set as a 1, which would be interpreted by the hardware as a C-type instruction.

So let's see if there are some other takeaways from this.

First, we can start to see why it is going to be pretty important that the tools we write validate the input that we are working with. We can't assume that the user is going to always get it right. If we don't validate that the integer constants in the Jack code are between 0 and 32767 (inclusive), then if (when) someone uses the same line of thinking that you did (and it's not an unreasonable think that you should be able to use -32768 in a programming language that works with 16-bit 2's complement integers), the tool would compile what should be an A-type instruction into a C-type instruction while letting the user believe that everything is fine, except the program is going to not work correctly. Imagine trying to track that bug down if all you know is Jack!

Second, we can ask whether or not it would be possible to let Jack support the full range of 16-bit 2's complement integers and how we might do that.

At what level should we do this? At what level can we do this?

Could we write our assembler to be able to handle it? If we can, then we don't need to change the VM translator or the Jack compiler at all. That would also mean that our Assembly, VM, and Jack programs can ALL use -32768 (but keeping in mind that we still can't use +32768).

But how would we do this in the assembler?

Our Hack assembler is a very simple assembler in that it translates each line of assembly language into exactly one corresponding Hack machine instruction. But we can't get -32768 into the A-register in one instruction, so we would have to violate this paradigm. But that's okay. Real assemblers do this as a matter of course. There are some instructions, called macro instructions, that the assembler reads in and then outputs a sequence of machine instructions -- sort of like a baby VM-translator.

Doing this, we could allow our A-type instructions to allow the full range of 16-bit 2's complement values. The logic would look something like this:

@value

If value is a number (not a symbol -- those are dealt with like normal), the verify that it is in the range -32768 to +32767.

Then, if value is non-negative, implement it like normal as 0(15-bit value).

If it is negative, but not -32768, output two instructions: The first loads the absolute value of the number into the A-register and the second implements A=-A.

If it IS -32768, then we can use load -32767 into the A-register (using the two instructions above) followed by A=A-1, for a total of three machine code instructions.

Here's a pop quiz for you: We could also load +32767 into the A-register followed by A=A+1, for just two instructions. Why does this trick work?

I'll let you think of how you might implement support for -32768 in the VM translator (so that it works with the unmodified assembler) or the Jack compiler (so that it works with the unmodified VM translator).

So we've established that we COULD support the full range of 16-bit 2's complement integers in our programs, but now let's consider whether we SHOULD. Does being able to use -32768 in a program actually do anything for us?

Probably not. It is a rather schizophrenic value (which is why that trick a while back works!). What happens if you take the negative of -32768? Do you get the value you would like it to be? Do we want to make it easy for programmers to use schizophrenic values?


Reply | Threaded
Open this post in threaded view
|

Re: 2s complement

HighSchoolerWhoAsksHowTooMuch
First of all, I would like to thank you for taking the time to write this. You took one question about compilers and articulated why Nand2Tetris is such a useful course.

I see your point that it makes sense not to have a language where you can have an integer whose absolute value is not the integer's actual absolute value.

I have a question about the following:

WBahn wrote
Doing this, we could allow our A-type instructions to allow the full range of 16-bit 2's complement values. The logic would look something like this:

@value

If value is a number (not a symbol -- those are dealt with like normal), the verify that it is in the range -32768 to +32767.

Then, if value is non-negative, implement it like normal as 0(15-bit value).

If it is negative, but not -32768, output two instructions: The first loads the absolute value of the number into the A-register and the second implements A=-A.

If it IS -32768, then we can use load -32767 into the A-register (using the two instructions above) followed by A=A-1, for a total of three machine code instructions.
I thought that negative numbers in a jack program (say, -17) are compiled into a push constant 17, neg. Therefore, the assembler would never see a negative number. This might however just be how my compiler handles it (I am either misunderstanding my compiler, or this difference is because I am doing the second version of the book).

To answer the question that you posed to me about how a VM interperter and compiler could each handle this situation, I would probably do the following:

A VM interpreter would interpert "push constant 32678, neg" the same way that it would interpert "push constant 32677, push constant 1, add."

A compiler could compile -32678 into "push constant 32677, push constant 1, add."

WBahn wrote
Here's a pop quiz for you: We could also load +32767 into the A-register followed by A=A+1, for just two instructions. Why does this trick work?
This would work because of the overflow element of the twos complement, where incrementing 32767 by one would yield -32768 (I think).

Again, thank you so much for taking the time to write your response!
Reply | Threaded
Open this post in threaded view
|

Re: 2s complement

WBahn
Administrator
You raise a valid point that slipped past me. You would need to consider the entire tool chain to see what changes had to be made in each, even if the lion's share of the support is provided in one of them.