I have a SQL table which uses strings for a key. I need to convert that string (max. 18 Characters) to a unique (!) 4-byte integer using PHP. Can anyone help?
-
1How would you fit a 18-byte string into a 4-byte integer? – Pekka Feb 27 '12 at 18:15
-
No. There are not enough 4-byte integers to cover all possible strings. – Ignacio Vazquez-Abrams Feb 27 '12 at 18:15
-
18 characters string has 18 Bytes. You said "_max._ 18 characters", thus there are much more combinations, then the said 18 Bytes. Now you want to compress 18 Bytes into 4 Bytes. This may work for a while, but not long. – KingCrunch Feb 27 '12 at 18:17
-
1Even if you could guarantee that the input string only used a range of 64 characters, you could only guarantee uniqueness to 16 characters. – Joe Feb 27 '12 at 18:18
-
Then perhaps you could combine 4 of those compressed numbers together and compress them down (again) into one? Woo-hoo infinite lossless compression! – Rudu Feb 27 '12 at 18:19
-
3Not even Jon Skeet can do this. – webbiedave Feb 27 '12 at 18:20
-
You could look for a 4-byte hash function but uniqueness is not guaranteed, yet for many purposes it does the job, depends on your domain. Maybe do supplementary lookups on duplicates untill you converted all the data, if it fits. – clyfe Feb 27 '12 at 18:21
3 Answers
Unique? Not possible, sorry.
Let's take a closer look:
With 18 characters, even if we were assuming only the 128 possible characters of ASCII (7 bits), you'd get 128^18 possible strings (and I'm not even going into the possibility of shorter strings!), which is about 8E37 ( 8 and 37 zeroes ).
With a 4-byte integer, you're getting 256^4 possible integers, which is about 4E9 ( 4 billion ).
So, you have about 4E28 more strings than you have integers; you can't have an unique mapping.
Therefore, you'll definitely run into a collision as soon as you enter the 4294967297th key, but it is possible to run into one as soon as you enter more than one.

- 91,498
- 46
- 177
- 222
Keep a lookup-table of strings to integers. Everytime you encounter a new string you add it to the mapping table and assign it a new unique ID. This will work for about 2^32 strings which is probably enough.
There is no way to do this for more that 2^32 distinct strings.

- 168,620
- 35
- 240
- 369
-
Not unique, there will be a collision eventually. The OP is asking for unique mapping. – Piskvor left the building Feb 27 '12 at 18:24
-
This _is_ unique. For up to 2^32 keys. Just as I said. This is a practical answer to a theoretically unsolvable problem. It is correct. – usr Feb 27 '12 at 18:27
-
All right then, but haven't you reinvented the SQL's autoincrement key? – Piskvor left the building Feb 27 '12 at 19:07
-
I just answered what the OP was requesting. I do not know why he did not use an auto-incrementing value, but there surely is a reason. In fact I would use the following schema for the mapping table: (ID int auto_increment, StringValue nvarchar(400) unique). – usr Feb 27 '12 at 19:10
You can't. A four-byte integer can represent 2^32 = 4 billion values, which is not enough to hold your target space.
If you currently have less then 4 billion rows in the table, you could create a cross table that just assigns an incremental value to each. You'd be limited to 4 billion rows with this approach, but this may be fine for your situation.

- 50,515
- 8
- 78
- 98