Multiple occurrences of an identical string

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple occurrences of an identical string

neophyte
Hello,

There is a post called "Nand2Tetris: string equality test isn't working" on another website that suggests there is a problem with the way strings are treated/compiled in Jack.  Specifically, each occurrence of an identical string is treated as a distinct object, assigned its own unique base address.  This leads to counterintuitive results.  Consider the following class (borrowed from the aforementioned post)--

class Main {
  function void main() {
    var String foo;
    let foo = "bar";

    if (foo = "bar") {
      do Output.printString("true");
    }
    else {
      do Output.printString("false");
    }

    return;
  }
}

This class outputs 'false' where one naively expects the output 'true'.  This counterintuitive result occurs because the second occurrence of "bar" is assigned a distinct base address by the Jack compiler.

So my question is simply this: Am I correct in viewing this deviation from the conventional semantics of identity statements and expressions involving strings as a defect of Jack's treatment/compiling of strings?  And if not, why not?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Multiple occurrences of an identical string

WBahn
Administrator
No, it's not a defect. It's part of the language specification, though not specifically called out.

You actually see this same behavior in many languages in which two objects are equal if and only if they refer to the same object. If you want to compare the equality of the contents of two different objects, then you have to use a function specifically intended for that purpose. In C, for strings, that function is strcmp(). Some languages try to make the language semantics more closely match human intuition and do a pretty good job for simple structures like strings, but even these you quickly get into the same issue for more advanced data structures.

What IS a defect in the language design is that Jack allows you to use string literals in places that prevent you from cleaning them up, leading to a memory leak.

Consider the code you gave. If you don't free up foo before exiting the function, the memory allocated to "bar" is lost. In this case, that's on the programmer because they allocated the memory and didn't deallocate it. But if you call a function that takes a string argument Jack allows you to use a string literal, but the called function generally doesn't (and shouldn't) deallocate the memory and once the function returns there's no way to deallocate it.

Real programming languages face the same issue and handle it in different ways. For languages that have automatic garbage collection, it is just incorporated into the normal garbage collection process. For languages that don't, a common way of dealing with it is for the compiler to incorporate all of the literal strings in the code into the code itself (they've got to be there one way or the other) and not put them into data memory at all. This can have some subtle effects and can also be a potential security vulnerability.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple occurrences of an identical string

neophyte
Thanks for the quick and illuminating response.

At first sight, the lack of conventional identity statements seems like a limitation on the Jack language's expressive power.  But the expedient you mention of a special function to compare equality of contents would compensate.  And I gather that, in principle, this could readily be defined in Jack and included in its Standard Library.  (Although at present this is only an assumption on my part, since I haven't yet read Ch. 12 on the Operating System.)
Reply | Threaded
Open this post in threaded view
|

Re: Multiple occurrences of an identical string

WBahn
Administrator
This post was updated on .
You're welcome. And your assumption is correct -- it could readily be incorporated into the String library. It's not, since such a function isn't needed to complete the projects and the authors have been pretty ruthless in trimming out anything that isn't absolutely necessary.