-2

I am looking for a way to generate unique random ids that will be as short as possible (similar to url shorteners) either through code or using MSSQL (preferable).

I know this can be done using NEWID or CRYPT_GEN_RANDOM(), but these methods only use A-F and 0-9, while I am looking for a solution that will use all 62 characters (A-Z,a-z,0-9) and therefore obtain a much shorter id.

user1480192
  • 665
  • 8
  • 23
  • Neither `NEWID` nor `CRYPT_GEN_RANDOM` return any characters at all. They return `uniqueidentifier` and `varbinary(8000)` data types respectively. – Damien_The_Unbeliever Mar 27 '19 at 07:13
  • We'd also need to know *how many* of these unique identifiers you'll need to generate because that actually determines how short they can be. Would the slightly expanded range of [Base 64](https://en.wikipedia.org/wiki/Base64) be a suitable target rather than the 62 you've identified? Or another *standard* encoding? – Damien_The_Unbeliever Mar 27 '19 at 07:16
  • 1
    What's wrong with using a UUID? It's only 16 bytes long. –  Mar 27 '19 at 07:48
  • The ids need to be used as parts of URLs so base 64 encoding is not a possibility because it uses some non-standard characters. I need to generate about 100 million unique codes which will be both unique and random. – user1480192 Mar 27 '19 at 07:52
  • 1
    So base 64, including *the variant specifically designed for use in URLs* is not suitable because you're using them in URLs? – Damien_The_Unbeliever Mar 27 '19 at 07:59
  • Base64 is perfectly usable with [some tweaks](https://brockallen.com/2014/10/17/base64url-encoding/). Note that most URL shorteners do not work this way -- they generate random IDs, but not unique ones, allowing IDs to expire/be reused with time. This allows for much shorter URLs at the cost of link rot. – Jeroen Mostert Mar 27 '19 at 08:58

1 Answers1

0

I can't say this is a suitable task for SQL Server, but it is achievable nonetheless.

First, you need to create the mapping table that contains all the characters you want to see in your URLs:

declare @charmap table (
  Id tinyint identity(0,1) primary key,
  Letter char(1) collate Latin1_General_BIN not null unique
);

insert into @charmap (Letter)
select top (26) char(row_number() over(order by (select null)) - 1 + ascii('A'))
from sys.all_objects ao
union all
select top (26) char(row_number() over(order by (select null)) - 1 + ascii('a'))
from sys.all_objects ao
union all
select top (10) char(row_number() over(order by (select null)) - 1 + ascii('0'))
from sys.all_objects ao;

In this example, I used a table variable to minimise the impact. You can turn it into a static table, of course; in that case, it only needs to be populated once.

Now, the following query will generate the specified number of codes all having the same desired length. First two variables control that:

declare @BatchSize int = 1000,
  @Length int = 7;

select h.RndHex,
  replace(
    (select cm.Letter as [data()] from (
      select top (datalength(h.RndHex)) row_number() over(order by (select null)) as [RN]
      from sys.all_objects ao
    ) ca
    inner join @charmap cm on cm.Id = cast(substring(h.RndHex, ca.RN, 1) as tinyint) % 62
    order by ca.RN
    for xml path('')
    ), ' ', ''
  ) as [ShortURL]
from (
  select top (@BatchSize) crypt_gen_random(@Length) as [RndHex]
  from sys.all_objects a, sys.all_objects o
) h;

Technically, 5 characters will give you power(62., 5) ~=916 million of unique combinations. However, you might want to increase code length in order to make them harder to guess. That's why my sample generates 7-character codes - for 100 million combinations used, it will provide about 35000 possible combinations per every one which was actually generated.

On the other hand, if guessability isn't an issue for you, you can keep codes' length at a minimum, which is 5 characters in your case.

Roger Wolf
  • 7,307
  • 2
  • 24
  • 33