4

Well, I have been looking at ways to generate UIDs in java code (most of them coming to stackoverflow too). The best is to use java's UUID to create unique ids since it uses the timestamp. But my problem is that it is 128-bit long and I need a shorter string, like say 14 or 15 characters. So, I devised the following code to do so.

Date date = new Date();
Long luid = (Long) date.getTime();
String suid = luid.toString();
System.out.println(suid+": "+suid.length() + " characters");

Random rn = new Random();
Integer long1 = rn.nextInt(9);
Integer long2 = rn.nextInt(13);

String newstr = suid.substring(0, long2) + " " + long1 + " " + suid.subtring(long2);
System.out.println("New string in spaced format: "+newstr);
System.out.println("New string in proper format: "+newstr.replaceAll(" ", ""));

Please note that I am just displaying the spaced-formatted and properly-formatted string for comparison with the original string only.

Would this guarantee a 100% unique id each time? Or do you see any possibility the numbers could be repeated? Also, instead of inserting a random number into a random position which "might" create duplicate numbers, I could do it either in the beginning or end. This is to complete the required length of my UID. Although this might probably not work if you need a UID less than 13 characters.

Any thoughts?

Rishi P
  • 249
  • 2
  • 9
  • 20
  • There is of course no guarantee that **any** ID is unique -- after all, the set of IDs is finite, and you can peek any number of times you want. (This is valid for 128-but UUIDs as well, of course.) – Vlad Nov 11 '11 at 19:28
  • Moreover, the less the size of your ID is, the more is the probability of collision. So basically you ought to choose longer IDs if you want your probability be be low. – Vlad Nov 11 '11 at 19:30
  • But a timestamp would always be unique, unless you make two requests within fraction of a second, which is very less than likely. That along with a random should increase the probability of getting unique all the time. – Rishi P Nov 11 '11 at 19:37
  • You may be surprised, but timestamp is not unique, too. :) How big is your timestamp type? 16 bit? Than among any 2^16 + 1 timestamp values there will be at least two equal ones. – Vlad Nov 11 '11 at 19:40
  • Well. mine is not 'timestamp' in the exact sense of the word. It's Date.getTime() which is the number of milliseconds. This one would not be repeating after a high limit, would it? (might be a silly question :P) – Rishi P Nov 11 '11 at 19:50
  • Well, the `Date`'s underlying type seems to be `long`. So it can basically repeat when the number of all possible `long` values is exhausted. Of course, of you will be asking one new value each millisecond, the number of milliseconds needed before the value repeats is 2^64, which [according to Google calculator](http://www.google.com/#q=2%5e64+milliseconds+in+years) is 584 554 531 years. (That must be practically enough.) – Vlad Nov 11 '11 at 20:07
  • Well, I realized that this is not as genius as I thought :( The getTime() depends on your machine/server time. So I guess this is just as much susceptible to errors as any other UIDs. Finally I think I will go the usual way - store the UIDs in database and check each time a new one is created. – Rishi P Nov 11 '11 at 20:18
  • Well, in fact there was much though in making UUIDs as "unique" as possible. (They use some specific things like network addresses, timestamps and so on.) So it's complicated to beat them with a simple code. Anyway, good luck with your project! – Vlad Nov 11 '11 at 20:24
  • My first question would be: Are you generating these IDs from multiple processes? If not, then a simple monotonically increasing sequence would suffice. From @Vlad's research, a long should be plenty large. UUID's were created to allow high probability of unique IDs generated from any machine at any time. – sceaj Nov 11 '11 at 20:39
  • Furthermore, since you mention a database, if these IDs are to be generated from within a single system, there are techniques that allow a monotonically increasing sequence to be handed out by multiple processes using synchronization at the database. Let me know if you want more information. – sceaj Nov 11 '11 at 20:42
  • @sceaj: Yes, I'm interested in more information. Please let me know. Thanks. – Rishi P Nov 12 '11 at 03:30
  • Too big for comment so I provided it as an answer. – sceaj Nov 13 '11 at 03:37
  • Is this part of distributed system or are you talking about a single process generating these UIDs? – Gray Nov 15 '11 at 23:15

2 Answers2

3

This won't work if this is a distributed system of course but how about something like the following.

private AtomicLong uniqueId = new AtomicLong(0);
...
// get a unique current-time-millis value 
long now;
long prev;
do {
    prev = uniqueId.get();
    now = System.currentTimeMillis();
    // make sure now is moving ahead and unique
    if (now <= prev) {
        now = prev + 1;
    }
    // loop if someone else has updated the id
} while (!uniqueId.compareAndSet(prev, now));

// shuffle it
long result = shuffleBits(now);
System.out.println("Result is " + Long.toHexString(result));

public static long shuffleBits(long val) {
    long result = 0;
    result |= (val & 0xFF00000000000000L) >> 56;
    result |= (val & 0x00FF000000000000L) >> 40;
    result |= (val & 0x0000FF0000000000L) >> 24;
    result |= (val & 0x000000FF00000000L) >> 8;
    result |= (val & 0x00000000FF000000L) << 8;
    result |= (val & 0x0000000000FF0000L) << 24;
    result |= (val & 0x000000000000FF00L) << 40;
    result |= (val & 0x00000000000000FFL) << 56;
    return result;
}

The bit shuffling could be improved on to generate more change in the values each iteration. You mentioned that you don't want the numbers to be sequential but you didn't specify a requirement for full random-ness.

Certainly not as good as the UUID but faster then the database operation.

Gray
  • 115,027
  • 24
  • 293
  • 354
1

The easy way is use database sequences if they are available. If they aren't, you can simulate them as follows:

  1. Create a table that has a column that will hold the maximum value used so far (initially 0). Some applications create multiple rows where each row controls a specific unique ID, but all you really need is one row. For this example assume the table structure is as follows:

    ID_TABLE
    ID_NAME    VARCHAR(40); -- Or whatever type is appropriate
    ID_COLUMN  INTEGER; -- Or whatever type is appropriate
    
  2. Each process reserves rows, by doing the following:

    a. Begin Txn;
    b. Update ID_TABLE set ID_VALUE = ID_VALUE + <n> where ID_NAME = <name>;     
    c. Select ID_VALUE from ID_TABLE where ID_NAME = <name>;  
    d. Commit Txn;
    

    If this all completes successfully, then you have just reserved the range (val - n + 1) through val where val is the returned value from step c. above.

  3. Each process hands out IDs from the range it reserved. If the process is multi-threaded it must provide synchronization to ensure each value is handed out at most once. When it exhausts its supply of values it goes back to step 2 and reserves more values. Note that not all values that are reserved are guaranteed to be used. If a process terminates without using all of the values it has reserved, the unused values are lost and never used.

sceaj
  • 1,573
  • 3
  • 12
  • 24
  • Yeah, this is good. But thing is all the ids will now appear very sequential, which I forgot to mention I didn't want. I wanted to make it look a little random at first glance, which the timestamp does look like. Thanks for your help. – Rishi P Nov 14 '11 at 06:59
  • It would help you get good answers if your question is as precise as possble. – sceaj Nov 14 '11 at 17:49