3

I'm designing a HTTP-service, with capacity of up to 500 million requests per day (served by more than one independent machine).

For each request I have to generate unique ID and return it to user. ID must be 100% unique within a window of 10 minutes. (1 day is preferred, globally unique IDs are ideal.) No server-server communication must be needed to generate that ID.

Silly pseudo-session example:

Client: GET /foo

Server: Content-Type: text/xml

        <root>
            <id>ab9d1972-2844-11e0-86b2-000c29544403</id>
            <other_data/>
        </root>

In previous generation of this HTTP service I used UUIDs.

I'm happy with UUIDs, but there is one problem: they are too long. On that number of requests, this extra size in noticeable in disk space waste for log files.

What is the best way to create a short, but unique identifier? To make things worthwhile, I guess, algorithm should produce at most half of UUID length while being unique for all day long (10 minutes should be even shorter).

Ideally, suggested algorithm would have sane, lightweight production-quality implementation in plain C.

Update: Generated ID should not require URI-encoding when passed in the GET request.

Alexander Gladysh
  • 39,865
  • 32
  • 103
  • 160
  • Lazy question (sorry, it is too late at night to do math): how long is UUID if encoded with ascii85 from binary? – Alexander Gladysh Jan 29 '11 at 00:35
  • @Alexander: Number of digits is `ceil(log(max_val)/log(num_different_chars))`. – Oliver Charlesworth Jan 29 '11 at 00:40
  • ASCII85 encodes 4 bytes in 5 characters. However, it is not *really* URI or human-friendly. (UUID is 128bits is 16 bytes is 20 characters ASCII85). –  Jan 29 '11 at 00:43
  • As far are making it unique, it depends upon exact requirements, but consider an approach like [twitter snowflake (twitter message numbers)](http://engineering.twitter.com/2010/06/announcing-snowflake.html) -- it uses only 64bits but a careful selection of machine/worker identification, time, and counters to guarantee uniqueness within the environment. Much more "guessable", but that's a weak reason/concern not to use a more problem-space refined approach. –  Jan 29 '11 at 00:46
  • @pst: why is ASCII85 not URI-friendly? (human-friendliness is not an issue) 20 characters is nice! – Alexander Gladysh Jan 29 '11 at 00:50
  • @Alexander Gladysha While Base64 has one (or two?) characters that must be escaped in a URI, ASCII85 contains far more. URI encoding != URI friendly, and it a real bummer to look at in a location bar. –  Jan 29 '11 at 00:53
  • @pst: ah! you're right... no, it would not do then. I need something that would not require URI encoding. Tripling the length is not good. – Alexander Gladysh Jan 29 '11 at 00:54

2 Answers2

5

Give each machine a unique prefix. Give each machine a counter. To generate an ID, increment the counter, and append its value to the prefix.

If you want to obfuscate the IDs, encrypt them - a cipher is a reversible transformation, so applying it to unique values will produce unique values.

Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
2

A few thoughts:

  • 500 million requests a day. Really?
  • Use UUIDs.
  • If required, don't use HTTP (as that's the more significant overhead) and transfer the UUID in a binary form.
  • You need a certain amount of bytes to guarantee that your server returns a truly unique ID.
  • How about using UDP?

Anyway, what the heck are you trying to do?

BastiBen
  • 19,679
  • 11
  • 56
  • 86