Why JackTockenizer needs a string without spaces?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Why JackTockenizer needs a string without spaces?

kraftwerk1611
Hi,

1- I am at the initial stage of writing a Jack compiler and so trying to find answers to the basic questions.

After reading the TECS book and looking at forum posts of Project 10, it seems that it is expected that we read *.jack files content without any spaces between characters. May be it is not clear to me because I dont have whole picture before me right now but I am still asking why it is important/required to have strings without spaces like

"staticbooleantest;"  instead of "static boolean test":

Why reading jack file word by word and matching them against lexical elements list is not recommended?

2-Also what should be the initial steps for writing a compiler? I have 3 classes for Analyser, Tokenizer and Compilation Engine. I can read a jack file line by line with or without spaces. Should I now read these long strings character by character and match with the elements of lexical elements given in Fig 10.5 page 208?

3-Is it necessary to use regular expressions for writing compiler? I am using Python ver 2.7 for project 10. Is this Python version too old for it?

4- Also I don't see 'Array' in the list of Lexical elements but it is found in *.jack files. What to do when 'Array' string is encountered while reading jack file?

Thanks for any replies.
Reply | Threaded
Open this post in threaded view
|

Re: Why JackTockenizer needs a string without spaces?

cadet1620
Administrator
1) You dont want to eliminate whitespace (spaces, tabs, newlines) before tokenizing. You do want to skip whitespace in Tokenizer.advance().

For the sequence "static boolean test;" your tokenizer should return
    keyword "static"
    keyword "boolean"
    identifier "test"
    symbol ";"

2)
In my compiler, Analyser creates a new CompilationEngine, passing it the names of the source and VM files, and optionally the XML file.  CompilationEngine creates a new Tokenizer passing it the name of the source file.

Tokenizer open the file and reads all the lines into a string array.  advance does something like
    while not end-of lines
        if current line.length == 0
            current line = next line
        skipWhitespace()      // trims whitespace from front of currentLine
        if currentLine[0] is alpha or '_'
            parseKeywordOrIdent();    // parses name, sets up Tokenizer return values, removes from currentLine
        else if currentLine[0] is numeric
            parseNumber();
        ...

3) I don't use regex. Python 2.7 if fine.  My compiler runs without modification on 2.7 and 3.x

4) Array is an identifier, as are all other class and variable names.

--Mark
Reply | Threaded
Open this post in threaded view
|

Re: Why JackTockenizer needs a string without spaces?

kraftwerk1611
Thank you.

I guess I have to look at the functionalities of different objects again. Till now I had been looking at it like this

-Analyzer initiates the whole process and asks Tokenizer to tokenize the input file
-Tokenizer tokenizes the input file, tags the tokens and keep them ready for CompilationEngine to use.
-CompilationEngine uses services of Tokenizer and creates the output XML file.


So it seems that the methods given in section 10.3 of book are all we need to implement. No more methods are required.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Why JackTockenizer needs a string without spaces?

cadet1620
Administrator
This post was updated on .
kraftwerk1611 wrote
So it seems that the methods given in section 10.3 of book are all we need to implement. No more methods are required.
You will want to implement more methods for purposes of code organization and readability.

For example, CompilationEngine.compileClass() would be a giant mess if you tried to write it as a single function. You will want to write functions for most of the syntactic elements in the Jack syntax.

Also note that there is recursion in the syntax. CompileExpression will end up indirectly calling itself to handle expressions like a[b[2]] where each of the subscripts are themselves expressions.

--Mark