3

I used this ULID example in a project where I not only needed the uniqueness offered by ULID but also its lexicographic sortability.

I discovered, however, that no matter how much I tried, I could not just get the ids generated in a loop sorted.

e.g.

class Test{
    public static void main(String[] args) {
            ArrayList<String> ulids = new ArrayList<>();
    
            for (int i = 0; i < 10; i++) {
               
                ulids.add(ULID.generate());
            }
    
            System.out.println("Original:\n..." + ulids);
    
            Collections.shuffle(ulids);
    
            System.out.println("Shuffled:\n..." + ulids);
    
            ulids.sort(new Comparator<String>() {
                @Override
                public int compare(String o1, String o2) {
                    return o1.compareTo(o2);
                }
            });
    
            System.out.println("Sorted:\n..." + ulids);
    
        }
}

Sample output:
Original:
...[01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng80wacxxskgej5d8mm]
Shuffled:
...[01edrp4ng896t1qhsngrz3h251, 01edrp4ng8w8qz9qtgy3958r1v, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng80wacxxskgej5d8mm, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8ekj0vt393tw12x8j]
Sorted:
...[01edrp4ng80wacxxskgej5d8mm, 01edrp4ng81d3mvkp8s7z19znm, 01edrp4ng86v07r6c9sh62ghr7, 01edrp4ng872nwfj6b9fsxjkkd, 01edrp4ng896t1qhsngrz3h251, 01edrp4ng8bpfw3m2q8bynd5st, 01edrp4ng8ekj0vt393tw12x8j, 01edrp4ng8fdn30qnr2ktddyz4, 01edrp4ng8jne084nsw5saesfe, 01edrp4ng8w8qz9qtgy3958r1v]

I checked the implementation and figured that since time was a core factor in the generation of ULIDs, and also since the sensitivity of time used was the millisecond i.e. (System.currentTimeMillis()) , I could get them sorted by introducing some delay in my id generation loop.

I introduced a delay of about 5 milliseconds and the ids all came out sorted; e.g:

class TestWithMsDelay{

   public static void main(String[] args) {
        ArrayList<String> ulids = new ArrayList<>();

        for (int i = 0; i < 10; i++) {
            try {
                Thread.sleep(5L);
                ulids.add(ULID.generate());
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        }

        System.out.println("Original:\n..." + ulids);

        Collections.shuffle(ulids);

        System.out.println("Shuffled:\n..." + ulids);

        ulids.sort(new Comparator<String>() {
            @Override
            public int compare(String o1, String o2) {
                return o1.compareTo(o2);
            }
        });

        System.out.println("Sorted:\n..." + ulids);

    }


}
Sample output:
Original:
...[2rjdme5a5h2ntcd20xq4z487tx, 2rjdme63a23ddsy0km21n6n34a, 2rjdme6pnrenx79zd3jj18est4, 2rjdme70bv45b648p82dbj584n, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme7psqdykt24qfymn2e4ba, 2rjdme80as7t1h1rr00m676718, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme9ea04x22rpx2f3rp5gez]
Shuffled:
...[2rjdme7psqdykt24qfymn2e4ba, 2rjdme6pnrenx79zd3jj18est4, 2rjdme80as7t1h1rr00m676718, 2rjdme63a23ddsy0km21n6n34a, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme70bv45b648p82dbj584n, 2rjdme9ea04x22rpx2f3rp5gez, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme5a5h2ntcd20xq4z487tx]
Sorted:
...[2rjdme5a5h2ntcd20xq4z487tx, 2rjdme63a23ddsy0km21n6n34a, 2rjdme6pnrenx79zd3jj18est4, 2rjdme70bv45b648p82dbj584n, 2rjdme7d8gx9v9db66ftsxbmqq, 2rjdme7psqdykt24qfymn2e4ba, 2rjdme80as7t1h1rr00m676718, 2rjdme8rztp50bad6ktkhrfhk8, 2rjdme93ngkxkfmf6aegqxer9e, 2rjdme9ea04x22rpx2f3rp5gez]

This is not good enough for my work... I don't want to wait for any length of time to generate ulids(even if it is 10us - 100us), the concept of artificial delay bothers me very much, lol.

So, I modified ULID.java and changed the time source from System.currentTimeMillis() to System.nanoTime()

To my surprise, I no longer needed any time delay in the loop to get the output ULIDs sortable.

I feel there must be a snag somewhere though; because the Java Spec warns that System.nanoTime() is not necessarily more accurate than System.currentTimeMillis()

e.g in the Javadoc for System.nanoTime() , it says:

This method provides nanosecond precision, but not necessarily nanosecond resolution (that is, how frequently the value changes) - no guarantees are made except that the resolution is at least as good as that of currentTimeMillis().

Also, the Javadoc for System.nanoTime() seems to indicate that it is not relative to the Epoch (as is System.currentTimeMillis())

I believe that this may cause bad behaviour(messed up sortability over time and maybe affect uniqueness) in using System.nanoTime() in ULID.java instead of System.currentTimeMillis()

QUESTION

  1. Are my fears legitimate
  2. What can I do if (1.) is true, to improve the ULID's time-sensitivity beyond 1 millisecond without destroying its strong points?
Saheb
  • 1,392
  • 3
  • 13
  • 33
gbenroscience
  • 994
  • 2
  • 10
  • 33

1 Answers1

2

The ULID has two parts: a time component and a random component.

The time component is the count of milliseconds since 1970.

The random component is updated in two cases:

  1. when the millisecond changes, a new random value is generated;
  2. when the millisecond is the same, the random value is incremented by one.

The implementation you show here doesn't do the second step.

Maybe you could include some code like this (just an example):

if (timestamp == previousTimestamp) {
    randomComponent++;
} else {
    randomComponent = RANDOM.nextLong();
}

Another problem that I found is that it uses Math.random(), instead of java.security.SecureRandom. To fix this, this is a sugestion:

import java.security.SecureRandom;
private static final RANDOM = new SecureRandom();

Finally, it's not recommended to use System.nanoTime() as it returns the number of nanoseconds since an arbitrary point in time. It's not the day time returned from the real time clock (RTC) in your mother board. This function is used to measure the elapsed time between two points in your code, maybe for benchmarking. Example:


long startNanos = System.nanoTime();

// do some expensive tasks here

long endNanos = System.nanoTime();

long elapsedNanos = endNanos - startNanos;

If you prefer, you can check the library ulid-creator. Maybe it can help. Example:

// Generate a ULID as UUID
UUID ulid = UlidCreator.getUlid();

// Or generate a ULID as String (Crockford's base32)
String ulid = UlidCreator.getUlidString();

Project page: https://github.com/f4b6a3/ulid-creator

EDIT

Sorry. I didn't answare the questions.

Are my fears legitimate

Yep, your wright.

What can I do if (1.) is true, to improve the ULID's time-sensitivity beyond 1 millisecond without destroying its strong points?

You can increase the ULID resolution, but it won't conform the ULID Spec (which is not a formal standard like RFC-4122 by the way). The resulting UUID is like a COMB GUID, created by Jimmy Wilson. The main idea on both is the same.

You can reserve more bits for timestamp component, but it will have a cost of some bits. For example, if you increase the time component from 48 to 64 bits, it will roll over around the year 2262 AD, but the random component will be reduced from 1208925819614629174706176 (2^80) to 18446744073709551616 (2^64). If the cost affects the strong points of ULID, it depends on your project.

I just implemented a generator for ULIDs with nanosecond resolution. I coincidently was working on it a few days ago. It actually has millisecond accuracy using the method System.currentTimeMillis(). The nanosecond resulution is simulated using the method System.nanoTime() between two subsequent calls.

If you still intend to use nanoseconds ULID, feel free to test it:

package your.package.name;

import java.security.SecureRandom;
import java.time.Instant;
import java.util.UUID;

/**
 * Utility class that creates a COMB GUID with nanoseconds resolution.
 * 
 * It borrows the main idea from ULID and COMB generators: a concatenation of
 * time and random bytes. It is composed of 64 bits for time and 64 for random
 * bits.
 * 
 * A Nano COMB has two components:
 * 
 * 1. Time camponent (64 bits): nanoseconds since 1970
 * 
 * 2. Random component (64 bits): a value generated by a secure random
 * generator.
 * 
 * Maximum time component year is ~2262 A.D. (2^63/10^9/60/60/24/365.25 + 1970)
 * 
 * @author: Fabio Lima 2020
 */
public final class NanoCombCreator {

    private long prevTime = 0;
    private long prevNano = 0;

    private static final long ONE_MILLION_NANOSECONDS = 1_000_000L;

    private static final SecureRandom SECURE_RANDOM = new SecureRandom();

    /**
     * Returns a time component in nanoseconds.
     * 
     * It uses `System.currentTimeMillis()` to get the system time in milliseconds
     * accuracy. The nanoseconds resolution is simulated by calling
     * `System.nanoTime()` between subsequent calls within the same millisecond.
     * It's not precise, but it provides some monotonicity to the values generates.
     * 
     * @return the current time in nanoseconds
     */
    private synchronized long getTimeComponent() {

        final long time = System.currentTimeMillis();
        final long nano = System.nanoTime();
        final long elapsed; // nanoseconds since last call

        if (time == prevTime) {
            elapsed = (nano - prevNano);
            if (elapsed > ONE_MILLION_NANOSECONDS) {
                try {
                    // make the clock to catch up
                    Thread.sleep(1);
                } catch (InterruptedException e) {
                    System.err.println("something went wrong...");
                }
            }
        } else {
            prevTime = time;
            prevNano = nano;
            elapsed = 0;
        }

        return (time * ONE_MILLION_NANOSECONDS) + elapsed;
    }

    /**
     * Returns the random component using a secure random generator.
     * 
     * @return a random value.
     */
    private synchronized long getRandomComponent() {
        return SECURE_RANDOM.nextLong();
    }

    /**
     * Returns a Nano COMB.
     * 
     * A Nano COMB is inspired on ULID and COMB generators.
     * 
     * It is composed of 64 bits for time and 64 for random bits.
     * 
     * @return a UUID
     */
    public synchronized UUID create() {

        final long timeBits = getTimeComponent();
        final long randomBits = getRandomComponent();

        return new UUID(timeBits, randomBits);
    }

    /**
     * Test method that generates many Nano COMBs in a loop.
     * 
     * @param args
     */
    public static void main(String[] args) {

        NanoCombCreator creator = new NanoCombCreator();

        for (int i = 0; i < 100; i++) {
            // Generate a Nano COMB
            UUID uuid = creator.create();

            // Extract the milliseconds and nanoseconds
            long milliseconds = uuid.getMostSignificantBits() / ONE_MILLION_NANOSECONDS;
            long nanoseconds = uuid.getMostSignificantBits() & ONE_MILLION_NANOSECONDS;

            // Instantiate an instant using the milliseconds and nanoseconds
            Instant time = Instant.ofEpochMilli(milliseconds).plusNanos(nanoseconds);

            // Print the UUID and the time it was generated (UTC)
            System.out.println("UUID: '" + uuid + "', time: " + time);
        }
    }
}

OUTPUT:

UUID: '16240ee8-3865-1503-d1fb-b4e85f991c6b', time: 2020-07-22T11:15:58.537327680Z
UUID: '16240ee8-3865-f90a-ca19-3ec529750ef7', time: 2020-07-22T11:15:58.537344064Z
UUID: '16240ee8-3866-dd7c-f32f-7acaebcf7766', time: 2020-07-22T11:15:58.537409664Z
UUID: '16240ee8-3868-0a99-3ead-b114e1d61520', time: 2020-07-22T11:15:58.537524800Z
UUID: '16240ee8-3868-efc8-937d-599c72de71a6', time: 2020-07-22T11:15:58.537541248Z
UUID: '16240ee8-386a-3643-6a5e-e3b5e3b03c71', time: 2020-07-22T11:15:58.537655936Z
UUID: '16240ee8-386b-132f-7016-057ab30a2920', time: 2020-07-22T11:15:58.537721408Z
UUID: '16240ee8-386b-f929-d5b0-f70b68aea3d9', time: 2020-07-22T11:15:58.537737280Z
fabiolimace
  • 972
  • 11
  • 13
  • Thanks for answering! I am aware of the uses of System.nanoTime and your suggestions concerning SecureRandom are spot on. That is what I also use as a rule. The issue as raised in the question is that I want to increase the time sensitivity of the algorithm if possible, to allow the sorting of the output to occur even when ulids are generated at shorter time intervals. – gbenroscience Jul 22 '20 at 04:35
  • On the other hand since you have indicated that the time component must be in millis, and there is no java time that is more precise than System.currentTimeMillis(), I will accept your answer. Thanks. – gbenroscience Jul 22 '20 at 04:37
  • 1
    Sorry. I didn't answer your questions. I just updated my answer. – fabiolimace Jul 22 '20 at 11:09
  • 1
    "I coincidently was working on it a few days ago. It actually has millisecond accuracy using the method" wow! How you were working on something that I needed some days ago. God is wonderful. Two people in different parts of the world working on something similar, and then meeting on stackoverflow by what men call `chance`. I will look at it. Thanks for everything! – gbenroscience Jul 22 '20 at 13:55
  • I assume that the output would still be sortable lexicographically speaking like a ULID , yes? – gbenroscience Jul 22 '20 at 14:04
  • 1
    The generator tries to be monotonic, i.e., a function that aways increases its value. If the timestamp goes too far ahead of the real time, it forces to wait one millisecond until the real clock catches up. In theory it can generate 1,000,000 values per millisecond. But good unit tests are necessary to garantee it. – fabiolimace Jul 22 '20 at 14:21
  • 1
    Maybe you could look at the ULID spec page: https://github.com/ulid/spec There's a list of many implementations for many languages there, including Java. I think it's better to use one implementation that many people already use. – fabiolimace Jul 22 '20 at 14:24