Assuming a 32-bit OS and processor. Suppose we define INTERN to take a series of bytes with an upper limit on the number of bytes (say 128 bytes) and return an unsigned integer in [0, 2^32). Another function SYMBOL-NAME takes the unsigned integer and returns the associated bytes (or stuffs them in a buffer, or whatever).
How many strings could we reasonably intern? Could we get to 2^32 without going
to disk?
If the strings are not unique, yes. :)
ReplyDeleteOtherwise, no.
Indeed.
ReplyDeleteBut to make the problem interesting, let's assume we're talking about unique strings. I think you ought to be able to intern over 2^32 of them because you ought to be able to compress the text down to a size that fits in RAM. (Assume the string is about the size of a lexical word in English).