Nand2Tetris Questions and Answers Forum › Assembler

Specifications vs Conventions

Classic

List

Threaded

1 message

WBahn

Specifications vs Conventions

Administrator

Two concepts commonly thrown around regarding programming languages (and many other engineering contexts) are specifications and conventions. At first, these often seem synonymous, but there is a very important distinction between them.

A "specification" is a requirement. It defines what is and is not acceptable. A convention is merely a widely-accepted way of doing something. A few examples relating to the Hack assembly language should serve to make things clear.

Here are some specifications from the Hack Assembly Language Specification (Section 4.2 of the 1st edition).

"Constant must be non-negative and are always written in decimal notation."

"A user-defined symbol can be any sequence of letters, digits, underscore (_), dot (.), dollar sign ($), and colon (:) that does not begin with a digit."

"Space characters are ignored."

"Blank lines are ignored."

In the C-instruction:

The comp field is required.

If the dest field is omitted, so is the '=' that separates it from the comp field.

If the jump field is omitted, so is the ';' that separates it from the comp field.

These are just some of the specifications, but more than sufficient for our purposes.

This means that we can't choose to end comp-only C-instructions with a semi-colon just because it makes the code look more like our favorite programming language. So

D=D+1;

violates the specification.

It means that we can't omit the comp field when we just want to do an unconditional jump. So

JMP

violates the specification.

Both of these are very reasonable statements and it is unlikely that anyone reading a program containing them would have any doubt as to what was meant -- with the exception that the bits in the C-instruction that control the ALU are not well defined in the case of JMP, but it doesn't matter what they are and we could easily define that they are to be all zero in this case.

Other instructions that we might find very useful are also prohibited by the specification, For instance, we often want to load values into the A register that have a desired pattern of bits. Perhaps we want the bits to alternate between 0 and 1, starting with a 0 in the most-significant bit. If our A-instruction supported binary or hexadecimal constants, this would be trivial. Perhaps we could use:

@ 0b 0101 0101 0101 0101

or

@ 0h 5555

But these violate the specification, and instead we have to use the very non-intuitive form

@ 21845

Now, we could choose to support any or all of these statements in our assembler, but it doing so we are actually defining a new language that is an extension of the official Hack Assembly Language. Software development tools do this all the time, but most include an option to limit them to strict compliance with the language standard so that developers can be confident that their programs will be compatible with other tools.

What about "conventions"? These are either suggestions or recognitions of how "most people" do things. A big part of programming -- and a part that is rather naturally overlooked by beginning programmers -- is the goal of making your code easily understandable to the people that read it. That might be a supervisor or a customer, it might be a coworker on the project or someone tasked with modifying the code ten years from now, or it might be you the day after tomorrow when it comes time to track down that bug you just discovered. Conventions establish common ways of doing common things in order to make communicating what your code does and how it does it much easier.

So does the Hack assembly language have conventions? Yes -- and most of them are pretty clearly pointed out. For instance, while it is specified that all mnemonics must be in upper case and that the rest are case sensitive (meaning that myvariable, MyVariable, and MYVARIABLE are required, by the language specification, to be seen as three unique and unrelated symbols), it is stated that, by convention, uppercase names are used for labels and lowercase names are used for variables. This aids program readability because when you see

@SOMENAME

you can infer that SOMENAME refers to a label in the code, while

@somename

is referring to a variable.

However, these are inferences you are making based on the assumption that whomever wrote the program is faithfully following the accepted conventions. In point of fact, that person may be completely unaware of them and you need to be able to make allowances for that.

As the developer of an assembler, you also need to recognize that conventions are not requirements. If your assembler requires 'SOMENAME' to be a label and 'somename' to be a variable, then your assembler is not conforming to the the language specification -- meaning that someone could write a program that is in complete compliance with the language spec and that your assembler would not accept it.

Another convention mentioned in the text is that machine language programs have a ".hack" extension while assembly language programs have a ".asm" extension. The language specification is actually silent on what the extensions can and can't be, or whether or not an extension is even required.

The description of the assembler that you are to write in Section 6.3 (again, of the 1st Ed) clearly states that your assembler reads a text file name "Prog.asm" and outputs a text file names "Prog.hack". It also explicitly indicates that the user is expected to supply the entire file name, including the extension, as a command line argument to your assembler.

Thus we seem to have a conflict, or some level of ambiguity, of what is truly required of a strictly conforming assembler. Since it is just a convention that input files have a ".asm" extension, your assembler should be able to accept files with any extension, or no extension at all. That the user is required to provide the extension when they invoke your assembler is consistent with that. But what should the output file then be called? The first thought would be to remove whatever extension the file name has and add a ".hack" extension, but what if the input file already has a ".hack" extension? That may not make much sense, but it IS allowed by the language standard.

Arguably, the Hack language specification and the requirements for the Hack assembler as slightly at odds in some of the fine details. This is unfortunate, but not uncommon. Real languages, in which committees consisting of dozens of highly-skilled professionals working for years to draft a specification, contain ambiguities that are only discovered much later.

So what do you do when you recognize one of them when you are trying to implement your assembler? Ideally, you get a ruling from the people that control the language specification and/or the assembler specifications regarding what was intended and what the preferred interpretation is. In reality, this is seldom an option. So, you have to make a decision regarding what YOU believe the best compromise solution is. As a favor to yourself and others, the issue and your solution should be well documented, perhaps in the comments in the program code, for future reference.

What about this particular case regarding the file name extensions? My two cents? Go ahead and require the input file name to have a ".asm" extension and reject any file that doesn't. That is completely workable for the Nand2Tetris project. If this program had a larger life -- say being used as part of a longer-term project -- my recommendation might well be different.

every other bit to be a 1
, while the dest and jump fields may be omitted.