Project 6 - Assembler Conformance

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Project 6 - Assembler Conformance

WBahn
Administrator
The Hack Assembly Language has a particular specification (provided in Section 4.2 of the 1st Edition) and any program that adheres to that specification should result in the exact same Hack machine code file when an assembler is run on it.

For instance, the specification states that space characters are ignored. This means that all of the following lines are equivalent.

D=D-1;JLE // Decrement counter and exit loop when D=0
D = D-1; JLE // Decrement counter and exit loop when D=0
D = D - 1; J LE // Decrement counter and exit loop when D=0
D = D-1 ; JLE // Decrement counter and exit loop when D=0

A "conforming" assembler is one that will properly assemble any assembly language program in which every line adheres to the language specification.

But what about the following lines?

MD=D+M
DM=D+M
MD=M+D
DM=M+D

Should an assembler treat these all as meaning the same?

Only the first of these adheres to the language specification. The language specification defines exactly seven mnemonics for the dest field and DM is not one of them. Similarly, it specifies exactly twenty-eight mnemonics for the comp field, and M+D is not one of them.

As a reminder, D+M is an "instruction mnemonic" and NOT an expression. It is merely a sequence of characters that was chosen to make it easier for humans to remember what that instruction does, compared to the sequence 1000010.

If the assembler is "strictly conforming", then it should reject all but the first one. However, the writer of an assembler might choose to accept all of them, in an effort to not force the programmer to have to remember a bunch of "fine print". The result is an assembler "extension" -- meaning a feature or capability that is not required by the language specification.

So what about

D=D+M
d=d+m

We could write an extension that allows mnemonics to be either upper case or lower case, but the result would be a non-conforming assembler because the language specification explicitly states that all assembly mnemonics must be written in uppercase.


A philosophical question (meaning that there is no absolute, correct answer) is whether we are actually being nice to the programmer by including extensions such as these. Consider what happens when they have written lots of programs that they use our assembler for and have leveraged our extensions and then, for whatever reason, they switch to a different assembler that doesn't support them. They now face the potentially daunting task of fixing lots of code.

Many tools (compilers, assemblers, interpreters, etc) support language extensions, some of them very unique to that one tool and some of them so widely supported that they are almost defacto parts of the language.

So what about the assembler that you write? Certainly, it should be conforming, but should it be strictly conforming? That's up to you. Just keep in mind that if it isn't, then programs you write that your assembler can process may not be handled successfully by the supplied assembler.

It's worth noting that the supplied assembler is not in strict conformance with the language specification. For example, it will accept either A+D or D+A as the comp field. But it will not accept 1+D instead of D+1. It also appears to require the dest field to appear in the specified order, so while it will accept MD, it will not accept DM.

This means that you can write programs that the supplied assembler will accept but that your assembler might not. This does NOT mean that your assembler is wrong -- it might simply be the case that the program relies on an extension that the supplied assembler supports and that yours does not.