Errors, warnings, lint, ... and reports.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Errors, warnings, lint, ... and reports.

WBahn
Administrator
When developing your assembler, you want to give some thought as to how "nice" you want it to be to the user. At one extreme, you could make no effort in this regard at all. At the end of the day, as long as your assembler produces the correct Hack machine code when given valid Hack assembly code, you have met the project requirements -- the "contract" for Project 6 explicitly states that your assembler will only be given a valid Hack assembly language program. If your program crashes when the input file doesn't conform exactly to the language specification, that's completely acceptable -- as far as meeting the project requirements go.

But you might want to go beyond that. First off, providing some output to the user is a good way of confirming that your development is proceeding as expected -- and of detecting that it isn't as soon as possible. Also, the output you produce can provide valuable clues as to what the problem is, very likely even making it glaringly obvious exactly where the problem lies and what the fix is.

But also consider that you will need to use an assembler in later projects, most especially in developing the virtual machine translator. You are free to simply use the supplied assembler, but as you have likely already observed, the output it gives when an input program is not completely valid is not always very helpful. Having an assembler that provides much more detailed feedback when issues arise can pay big dividends down the road. Even when the input file is valid, it can be useful to generate a report with useful information, especially if you find yourself trying to track down a bug in your VM translator or wanting (or needing) to improve the performance of the code it generates.

You probably don't have a good feel for what kinds of feedback that a "nice" assembler should provide. That's fine -- nothing is engraved in stone and you can always add new features later. A good start might be to think back to the assembly language programs you wrote in Project 4 and, if you had some issues, what kind of feedback would have been helpful had the supplied assembler provided it. It's also helpful to consider the kinds of issues that might crop up and how you would like to deal with them (at least initially).

ERRORS

An "error" is a problem that prevents the assembler from producing correct machine code for the contents of the assembly language file. It is a fatal issue -- the problem must be fixed and the assembler rerun on the modified file.

For the assembler, identifying errors is pretty straight forward. Each line can be examined without regard to what is on any other line. A line can be categorized as having nothing but comments (we'll lump blank lines into this group), being a label, or having exactly one A- or C-instruction (each of these possibly having a comment at the end). After identifying what type of line it is, it can be broken into its constituent pieces and each piece examined to see if it is properly formatted. If not, a pretty minimal level of feedback is to simply tell the user that it isn't properly formatted, though to be nice, you should also tell them what line in the input file the problem was found on. Even if you provide no information about what was wrong, the rules for Hack assembly are simple enough that most people can readily figure out most errors once they are made aware of their existence.

Consider if your assembler did nothing more but print something like the following to the screen after being run:

15 D;M+1
20 D:JNE
37 @ 1LEFT
42 (UP&LEFT)
52 / Restart from the top-left pixel
59 M=1+M
67 A=A+D
72 DA=D+1

For most of these, you probably spotted the issue right away. The last two or three might have you scratching your head. In Line 59, there is no "1+M" mnemonic -- remember, these are not mathematical expressions, but mnemonics. The problem with Line 67 is that there is no defined computation mnemonic "A+D", only "D+A". The supplied assembler actually accepts this one (it does not accept "1+M"), but a strictly conforming assembler would not. Similarly, in Line 72, the problem is that "DA" is not a defined mnemonic for the destination field, it should be "AD". Some assemblers might accept both, but even the supplied assembler does not.

This is a good time to mention another feature to consider. Many tools, including the supplied assembler, stop processing the input as soon as the first error is encountered. This means that, if the above errors all existed in the same program, you would have to run the assembler once for each error, which can be frustratingly annoying. An alternative you might consider is committing to making at least one full pass through the code and generating feedback about all of the issues that are found even if an error makes it impossible to generate the machine code output. The two-pass nature of the assembler actually makes this pretty simple to implement.

The next level is to try to provide meaningful feedback regarding what the issue was. Consider if the output had, instead, been something like:

15 D;M+1  ERROR: C-type - invalid jump mnemonic ('M+1')                                  
20 D:JNE   ERROR: C-type - illegal character (':')
37 @ 1LEFT ERROR: A-type - invalid character in decimal value
42 (UP&LEFT) ERROR: L-type - invalid character in symbol name
52 / Restart from the top-left pixel ERROR: C-type - invalid character ('/')
59 M=1+M ERROR: C-type - invalid comp mnemonic ('1+M')
67 A=A+D ERROR: C-type - invalid comp mnemonic ('A+D')
72 DA=D+1 ERROR: C-type - invalid dest mnemonic ('DA')

Line 15 was likely meant to be D=M+1, but depending on how your assembler works, this likely would have been categorized as a C-type instruction with no destination field (due to no '=' symbol), but with both comp and jump fields (due to the ';' symbol). Line 52 might have been characterized as a C-type instruction by default -- it wasn't skipped as being a comment (since there was no '//') and it wasn't a label or A-type (didn't start with '(' or '@'), so it must be a C-type.

The point is that even when the feedback isn't the "obvious" error to a human, it provides insight into how the tool interpreted the input, which can greatly aid a human trying to understand the error.

In the case of Lines 67 and 72, the programmer might not be aware that these are invalid, so the feedback at least gives them some guidance regarding what they need to be looking into when they delve into the language specification.

WARNINGS

A "warning" is an issue that doesn't prevent the assembler from producing the correct output for what was given, but that might not be what the user really intended. For example, you might choose to support things like "1+M", "A+D" as comp mnemonics, or "DA" as a dest mnemonic. But it might be nice to let the programmer know that their code isn't strictly compliant and may not work with other assemblers. So those lines might produce something like:

59 M=1+M WARNING: C-type - comp mnemonic ('1+M') translated to ('M+1')
67 A=A+D WARNING: C-type - comp mnemonic ('A+D') translated to ('D+A')
72 DA=D+1 WARNING: C-type - dest mnemonic ('DA') translated to ('AD')

Even perfectly valid code can be worthy of a warning. It was mentioned in the text that C-instructions that might cause a jump should not contain a reference to M. This is almost always a logic error (as opposed to a syntax error), but it is legal. Hence your assembler must produce the corresponding output, both because it IS valid code, but also because it might not be a logic error -- the programmer might have some very specific reason for doing what they did. So you might produce something like the following:

83 M;JGT WARNING: C-type - possible conflicting use of A register

While warnings have the potential to save a lot of grief, we need to be careful not to go overboard. The goal is for the user to take every warning seriously, hopefully with the intention of producing warning-free code. If we issue warnings about things that might cause problems but that usually don't, we run the real risk of the user becoming so annoyed by all the false flags that they end up ignoring all of the warnings.

LINT

This term comes from dryer lint. Most of the time, if we have lint on our clothes we don't worry about it. But when we want to be seen in our most favorable light, such as when giving a speech or receiving an award, we take the time and make the effort to remove it.

The same can be said of our programs, particularly with regard to adhering to use and style conventions. Consider the convention that labels are uppercase and variables are lowercase. If we accidentally violated this convention, it would be nice to have our assembler tell us. Perhaps something like:

31 (profit)  LINT: Labels should be uppercase
53 @FUEL LINT: Variables should be lowercase

Not only will fixing these make your code much more readable, but this can reveal typos that could otherwise be extremely difficult to track down. Consider the following code snippet:

(NEXTPASS)
...

@NEXT_PASS
0;JMP

This is valid code, with the first being a label and the second being a variable. But if these were meant to be the same label, then the program would behave very oddly and it might be difficult for us to spot the mistake, especially if these two uses were separated by a lot of code. But what if we got a lint report saying:

347 @NEXT_PASS LINT: Variables should be lowercase

We would probably recognize very quickly that NEXT_PASS was not meant to be a variable at all -- and we would know exactly what to be looking for in order to figure out why it was not being seen as the label it was meant to be.

What is, and is not, lint is in the eye of the beholder, and just like with warnings, we need to exercise caution to avoid going overboard.

REPORTS

As your assembler process the input file, it produces (or has the ability to produce) information that can be extremely useful to the user for a variety of purposes. Even something as telling the user the total number of instructions in the resulting ROM image let's the user get a feel for how much, or how little, space they have left if they want/need to make modifications to their program.

Another very useful block of information is the symbol table that it builds and uses in generating the Hack machine code. If this table is output (to the screen or to some kind of report or log file), the user can review it. One of the things it can be useful for is identifying symbols that shouldn't be there, such as misspelled or miscapitalized symbols.

This information can be enhanced by also including the line number in the .asm file where the symbol is first encountered and/or a list of ROM addresses that (i.e., machine code line numbers) that reference each of the symbols. This information can be very useful in debugging faulty code.

As with the other types of supplemental output, you can start off with what you think is easy and useful and then add additional features as warranted -- such as when you are debugging a program. Think about what kind of information would be useful to have and then see if there is a reasonable way to get your assembler (or other tool down the road) to report that information.