Nand2Tetris Questions and Answers Forum › Compiler › Project 10

Why JackTockenizer needs a string without spaces?

Classic

List

Threaded

4 messages Options

kraftwerk1611

Why JackTockenizer needs a string without spaces?

Hi,

1- I am at the initial stage of writing a Jack compiler and so trying to find answers to the basic questions.

After reading the TECS book and looking at forum posts of Project 10, it seems that it is expected that we read *.jack files content without any spaces between characters. May be it is not clear to me because I dont have whole picture before me right now but I am still asking why it is important/required to have strings without spaces like

"staticbooleantest;" instead of "static boolean test":

Why reading jack file word by word and matching them against lexical elements list is not recommended?

2-Also what should be the initial steps for writing a compiler? I have 3 classes for Analyser, Tokenizer and Compilation Engine. I can read a jack file line by line with or without spaces. Should I now read these long strings character by character and match with the elements of lexical elements given in Fig 10.5 page 208?

3-Is it necessary to use regular expressions for writing compiler? I am using Python ver 2.7 for project 10. Is this Python version too old for it?

4- Also I don't see 'Array' in the list of Lexical elements but it is found in *.jack files. What to do when 'Array' string is encountered while reading jack file?

Thanks for any replies.

cadet1620

Re: Why JackTockenizer needs a string without spaces?

Administrator

1) You dont want to eliminate whitespace (spaces, tabs, newlines) before tokenizing. You do want to skip whitespace in Tokenizer.advance().

For the sequence "static boolean test;" your tokenizer should return
keyword "static"
keyword "boolean"
identifier "test"
symbol ";"

2)
In my compiler, Analyser creates a new CompilationEngine, passing it the names of the source and VM files, and optionally the XML file. CompilationEngine creates a new Tokenizer passing it the name of the source file.

Tokenizer open the file and reads all the lines into a string array. advance does something like

    while not end-of lines
        if current line.length == 0
            current line = next line
        skipWhitespace()      // trims whitespace from front of currentLine
        if currentLine[0] is alpha or '_'
            parseKeywordOrIdent();    // parses name, sets up Tokenizer return values, removes from currentLine
        else if currentLine[0] is numeric
            parseNumber();
        ...

3) I don't use regex. Python 2.7 if fine. My compiler runs without modification on 2.7 and 3.x

4) Array is an identifier, as are all other class and variable names.

--Mark

kraftwerk1611

Re: Why JackTockenizer needs a string without spaces?

Thank you.

I guess I have to look at the functionalities of different objects again. Till now I had been looking at it like this

-Analyzer initiates the whole process and asks Tokenizer to tokenize the input file
-Tokenizer tokenizes the input file, tags the tokens and keep them ready for CompilationEngine to use.
-CompilationEngine uses services of Tokenizer and creates the output XML file.

So it seems that the methods given in section 10.3 of book are all we need to implement. No more methods are required.

Thanks.

cadet1620

Re: Why JackTockenizer needs a string without spaces?

Administrator

This post was updated on .

kraftwerk1611 wrote

So it seems that the methods given in section 10.3 of book are all we need to implement. No more methods are required.

You will want to implement more methods for purposes of code organization and readability.

For example, CompilationEngine.compileClass() would be a giant mess if you tried to write it as a single function. You will want to write functions for most of the syntactic elements in the Jack syntax.

Also note that there is recursion in the syntax. CompileExpression will end up indirectly calling itself to handle expressions like a[b[2]] where each of the subscripts are themselves expressions.

--Mark