Ascii Codes instead of true binary?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Ascii Codes instead of true binary?

matt.kroger
This is part question, part comment.

Am I correct in assuming ( based on some of the pre-packaged .hack programs ) that the output of the assembler is expected to be 16 ascii(/UTF8) characters, and one per line ( so the final output has a newline / crlf after each 16 )?  I assumed we would have to do some interesting math, most likely using bitwise operators, to create true final binary output.  While simplifying it down this way makes sense, I'm kind of wondering why the authors would gloss over this pretty critical piece - in most cases they're very thorough and let you know when they're over-simplifying something or expecting a "non-standard" output.  If I have a suggestion/request I guess it would be that there was just something clarifying the expectation for the format of the output better.  I can't be the only one this has confused.

Of course, I'm not the world's MOST thorough reader, so I guess I could have just missed it.

Thanks,

Matt
Reply | Threaded
Open this post in threaded view
|

Re: Ascii Codes instead of true binary?

ybakos
Perhaps what you're overlooking is that translating from assembly to binary is, at its core, a simple translation. Mnemonics map to instruction formats, which map to encodings.

If you feel that translating to ASCII instead of binary feels unrealistic, realize that the choice of generating ASCII characters makes it easy to read the assembler output, especially while debugging. Think of it as "simulated binary," just as the hardware was simulated.
Reply | Threaded
Open this post in threaded view
|

Re: Ascii Codes instead of true binary?

cadet1620
Administrator
In reply to this post by matt.kroger
From 6.2.1
Binary Code (.hack) Files  A binary code file is composed of text lines. Each line is a sequence of 16 ‘‘0’’ and ‘‘1’’ ASCII characters, coding a single 16-bit machine language instruction. Taken together, all the lines in the file represent a machine language program. When a machine language program is loaded into the computer’s instruction memory, the binary code represented by the file’s nth line is stored in address n of the instruction memory (the count of both program lines and memory addresses starts at 0).
Note that the Nand2Tetris tools require ANSI/ASCII encoding not UTF8. UTF8 encoded files start with a "byte order mark" 0xEF, 0xBB, 0xBF, that the tools can't handle.

For what it's worth, representing binary data with ASCII characters has a long history. Back in the late 70s we sent punched paper tape in BNPF format to part houses for them to burn ROMs for us. BNPF format used those characters to encode each 8-bit byte using 10 characters: B followed by 8 N or P followed by F. Spaces, returns and line-feeds were allowed between bytes. [B = begin, N = negative (0), P = positive (1), F = final]

     ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    /                                                                                 o                                                                             o          /
   /      oo oo  o o oo  oooo o oo   oo oo oo  o o  o ooo oooooo oo   oo  o oo  oo o o ooo    o oo ooo oooooo oo  o oooo oo  o oo o ooo oooo o ooooo  o o ooooo o oo o        /
  /        o oo  o o  o  oooo o  o   oo oo  o  o o  o  oo oooooo  o   oo  o  o  oo o oo  o    o oo  oo oooooo  o  o oooo  o  o oo o  oo oooo o  oooo  o o  oooo o ooo        /
 /........................................................................................................................................................................../
<          o oo  o    o  oooo    o   oo o   o  o o     oo ooooo   o   oo     o  oo o  oo o    o o   oo ooooo   o  o ooo   o  o oo    oo oooo    oooo  o    oooo o o oo     <
 \          o  oo o    oo    o    ooo  o     oo o oo     o         ooo  oo    oo  o o     oooo o      o         oo o       oo o  o     o    o       oo o       o o          \
  \                 o          o          o          o          o          o                      o          o          o          o          o          o                   \
   \      oooooooooo oooooooooo oooooooooo oooooooooo oooooooooo oooooooooo oooooooooo  oooooooooo oooooooooo oooooooooo oooooooooo oooooooooo oooooooooo oooooooooo          \
    \              oo         oo         oo         oo         oo         oo         oo          oo         oo         oo         oo         oo         oo         oo          \
     ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 byte per inch -- good thing the ROMs were small then!

We still use ASCII hexadecimal to send ROM images to programming houses.

--Mark
Reply | Threaded
Open this post in threaded view
|

Re: Ascii Codes instead of true binary?

ybakos
Killer ascii art. I always love your deeper perspective.
Reply | Threaded
Open this post in threaded view
|

Re: Ascii Codes instead of true binary?

matt.kroger
In reply to this post by cadet1620
I absolutely agree that assembly to machine code (especially with the assembler design, which I think really succeeds at being extremely powerful and representative while being simple, understandable, and straightforward to implement) is a simple translation, and that the method of outputting ASCII representations of binary code does "simulate" what's being done in a very straightforward fashion.  I do think an editor like HxD (I'm not trying to recommend a product here, just give an example) produces the same output (you can configure it to show binary and two bytes per line) and still expects you to learn about bitwise operations and the fact that there are no newlines in a binary file.

But I'm not actually arguing against the decision to implement things this way.  I think maybe I got used to Nisan and Schocken's excellent "Perspective" sections for each chapter which so often give a lot of good background, and give you an idea "where you need to go from here if you want to study this more", and was simply frustrated when I spent a lot of time implementing bitwise operations and maps based on numbers, rather than symbols, and now I have to refactor (read, "delete") a lot of code.

Thanks for the note about UTF8 - I don't believe the text editor I've been using outputs utf8, but that could have been a source of confusion.

And overall, I hope it's understood that I love this project!  There are pieces I'm beyond, and pieces I've had no exposure to, and the idea of an end-to-end project like this that simplifies in many ways but gives hands on experience and, more importantly encourages you to explore deeper, from the ground up is perfect.  We need more courses like this!

-Matt