1

in the TypeScript compiler, src/compiler/parser.ts contains the following, where identifiers is a Map of strings to strings:

function internIdentifier(text: string): string {
    let identifier = identifiers.get(text);
    if (identifier === undefined) {
       identifiers.set(text, identifier = text);
    }
    return identifier;
}

This has the same behavior as the identify function for strings:

const id = (text: string) => text

I assume it's there for performance. How could this improve performance? I'm asking because:

  • I think JS VMs already intern strings (but haven't found evidence yet)
  • the code doesn't seem to save on string creation. One must create a string (text) in order to look up the same string in the map.
Max Heiber
  • 14,346
  • 12
  • 59
  • 97

1 Answers1

1

It saves on memory. Take the following example:

const s1 = readFromFile();
const s2 = readFromFile();
const s3 = readFromFile();

How many different string objects do you have in memory? 3, but all contain the same characters.

Now take the following:

const s1 = internIdentifier(readFromFile());
const s2 = internIdentifier(readFromFile());
const s3 = internIdentifier(readFromFile());

How many different string objects do you have in memory? Just 1. All three variables refer to the same string object.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • Sorry, I'm still confused. Did you see the two bullet points at the bottom? - (1) JS VMs already intern strings - (2) The code doesn't seem to save on string creation. One must create a string (text) in order to look up the same string in the map. I guess if (1) isn't true, then the answer to (2) could be that the extra strings will get GC'd so eventually there will be memory savings? – Max Heiber Jul 03 '19 at 15:29
  • @MaxHeiber JS VMs intern **some** strings (literals only, AFAIK). If you read and parse a file (what compilers do), all the tokens you read won't be interned. And yes, the strings returned by the two calls to readFromFile() will be GCed. – JB Nizet Jul 03 '19 at 15:34
  • See https://stackoverflow.com/questions/5276915/do-common-javascript-implementations-use-string-interning – JB Nizet Jul 03 '19 at 15:35
  • 1
    BTW, this also probably helps on performance, since every equality check between two equal interned strings can results in an identity check (i.e. both pointers are equal). – JB Nizet Jul 03 '19 at 15:39
  • I saw the other post you linked to before (https://stackoverflow.com/questions/5276915/do-common-javascript-implementations-use-string-interning). No one cites any sources or links to source code. – Max Heiber Jul 03 '19 at 19:08
  • 2
    @max: which source code should be quoted? Afaik, nothing in the ES standard requires that strings be interned, so the fact that interning is done or not by some implementation proves nothing about other ones. Also, you assert that JS does intern strings, but you don't provide a source or link either :-) Just sayin' – rici Jul 04 '19 at 00:04
  • @rici point taken, updated to add 'I think'. The source that (ideally) would be quoted is at least one of V8, SpiderMonkey, Chakra, or JSCore source. – Max Heiber Jul 04 '19 at 16:06