Henoktes722 wrote
Thank you really great answer.
But we may have many options that we can't extend in addition to space and '('. Like in this example
x<5
we can't extend operators like '<' in this example, so if that is the case won't have many options that we can't extend?
For each type of token (and for each partial string at a given point) we have a specific set of characters that can be used to extend that token. Any other character can't.
So in the case of x<5 we start out with "" (i.e., and empty string) which is compatible with any type of token. We then read 'x' and that is only compatible identifiers (since no keyword starts with an 'x'). So now we have "x" and next we read '<'. This can't be used to extend an identifier, so we have found the first token
identifier: "x"
Now start our next token with the character that couldn't be used, namely "<". This is only compatible with a symbol token. The next character, '5', can't extend this so we have
symbol: "<"
Now start our next token with the character that couldn't be used, namely "5". This is only compatible with an integerConstant (and this is why identifiers can't start with a digit -- we want to limit the number of possible types as quickly as possible). The next character is the end-of-line character, which can't extend an integerConstant, so we have
integerConstant: "5"
Now start our next token with the character that couldn't be read, namely the end-of-line character. No token can start with this, but we are allowed to discard it and move on to the next character.
Just keep going like this until you get to the end of the file.