Assembler incorrectly interprets numbers larger than 2^15-1 as variable symbols

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Assembler incorrectly interprets numbers larger than 2^15-1 as variable symbols

Nikaoto
This post was updated on .

Problem

I've discovered that the hack assembler provided by Nand2Tetris has a minor bug. It incorrectly interprets any value larger than 2^15-1 (32767) as a variable symbol definition. The MIT press 2008 release of the book states on page 72 that "A user-defined symbol can be any sequence of letters, digits" ... "that does not begin with a digit". Yet, as can be seen in the screenshot and included code, the A-instruction "@32768" does NOT write 0 into the A register (since 32768 is 1 followed by 15 zeros in binary and when truncated to 15 bits will be equal to 0) but instead allocates the next available block of memory in the RAM.
The software suite I am using is the latest, version 2.6.
I have also included a patch file for the HackAssembler.java file which correctly throws an error when it recognizes a number that is larger than 2^15-1, since the computer only has 2^15 bits of memory.
All relevant files are linked at the end of this post.

Example

Here is a small program I made to illustrate the issue.


// Testing to see if @32768 is recognized as a value or symbol definition. The
// assembler should either give 15 zeros as the value or give an error that the
// value is out of range. Instead what it does is, it creates a symbol named
// "32768" and stores it in the next available block of RAM (after 16).

    // Write 99 into block of RAM at address 32767
    @99
    D=A
    @32767
    M=D
    // The code above is valid and should assemble and work as intended

    // Now write 69 into block of RAM at address 32768 (which doesn't exist)
    @69
    D=A
    @32768
    M=D
    // When assembling this, the assembler should either give an error saying that
    // we're trying to access memory out of bounds and halt, or it should
    // convert 32768 to binary and truncate it, giving us 0. But it instead
    // recognizes 32768 as a variable symbol definition, which gets stored at the next
    // available dynamic RAM block (in this case, 16). This is incorrect behavior,
    // because according to the definition of "Constants and Symbols" on page 72 of the
    // book, "A user-defined symbol can be any sequence of letters, digits" ... "that does not
    // begin with a digit".

    // So, now we can do something like:
    @66
    D=A
    @32768
    M=D

    // Which, according to the manual should store 66 at address 32768, but instead
    // stores 66 at 16.

    // The same applies to any number larger than 2^15:
    @42
    D=A
    @32769
    M=D
    // A variable label with name "32769" is created at address 17

Here's the Assembler GUI assembling the code above incorrectly without any errors or warnings:
GUI Assembler giving no warnings/errors while assembling and returning incorrect binary code.

Attachments

Example hack code: label_test.asm
Example hack code compiled with included assembler: label_test.hack
Example hack code compile with my assembler (which truncates large values): label_test_correct.hack
Fix patch file: HackAssembler.patch


Oh also, thank you for the amazing book and course. Just trying my best to give back for everything this course has taught me.
Reply | Threaded
Open this post in threaded view
|

Re: Assembler incorrectly interprets numbers larger than 2^15 as variable symbols

Nikaoto
Forgot to add. If you wish to see the source for my assembler, you can get it from my private git at https://git.nikaoto.com/hasm or my github at https://github.com/Nikaoto/hasm
Reply | Threaded
Open this post in threaded view
|

Re: Assembler incorrectly interprets numbers larger than 2^15 as variable symbols

WBahn
Administrator
This is not a bug in the assembler, but rather the invocation of undefined behavior. The tool is free to do whatever: crash, issue a warning, generate code that makes no sense, start a global thermonuclear war, or do exactly what you would like it to do. All of those are completely and equally valid because the behavior is undefined.

One thing that needs to be kept in mind -- and this is mentioned quite pointedly in the book -- is that the projects all assume that only correct code will be fed to any of the tools. That this is unrealistic for a real tool is acknowledged, but this is part of the price for narrowing the scope of the projects so that they can be reasonably done over the course of one semester.

It can certainly be argued that the tools that the authors provide would benefit from identifying and reporting when undefined behavior is invoked and to quite some degree they do. After all, even if all of the test programs that the authors provide turn out to be completely correct, it is a near-sure thing that the code written or produced by students will sometimes not be.

But in some cases the authors have gone too far in trying to do this. They trap attempts by the VM to access memory outside the HEAP or the SCREEN, but there is fundamentally no reason to do this and doing so prevents using common tricks to access arbitrary memory for purposes such as doing Peak and Poke implementations or for implementing test code to benchmark programs.

More annoying is that their CPU Emulator does not faithfully implement their implicit ALU design -- but in their defense there is no claim that it does, only that it faithfully implements the 28 instructions defined by the Hack Instruction Set.
Reply | Threaded
Open this post in threaded view
|

Re: Assembler incorrectly interprets numbers larger than 2^15 as variable symbols

Nikaoto
Got it, thanks for the reply! I did not realize this could be counted as undefined behavior.
Reply | Threaded
Open this post in threaded view
|

Re: Assembler incorrectly interprets numbers larger than 2^15 as variable symbols

WBahn
Administrator
It could actually be argued that it is a syntax error and, in that case, it should throw an error. Certainly if you had it use just the least significant 15 bits of the number THAT would be undefined behavior.

Another reason not to trap this particular case (though I doubt the authors had this in mind) is that useful extensions are to allow full sixteen bit A-type instructions, both signed and unsigned.

So

@ 42000

or

@ -1234

could be supported.

Of course, THEIR assembler doesn't support these, so it would be nice if it at least reported that undefined behavior has been invoked.