3

Because I am writing software which generates SQL with a potentially large number of parameters and records that SQL onto disk, I have an uncommon requirement (perhaps more a curiosity): Generate the shortest possible unique parameter names.

Parameter names follow identifier naming rules which are usually:

  1. First character is alphabetic
  2. Subsequent characters can be alphanumeric or certain other characters, such as an underscore.
  3. Almost anything can be used if quoted (ignored -- a quoted identifier is at least three characters total, e.g. [_])

The SQL generation code knows how many identifiers there are total, so names can be generated based on an integer.

Charles Burns
  • 10,310
  • 7
  • 64
  • 81

3 Answers3

1

This ended up being more difficult than I anticipated, and the solution less elegant. I hard-coded invalid values (starting with 0) because they are few and every attempt I made to derive them ended up complicated and slow. I would appreciate ideas on how to make this more elegant. I'll post on CodeReview as well.

Most databases support fewer than 2^16 parameters (a ridiculous number to actually use), but in handling numbers larger than 35027 (also ridiculous) the numbers worked out such that 1 million was a good forced stopping point.

public static String intToDatabaseIdentifier(int number)
{
    if(number < 0 || number > 1000000)
        throw new ArgumentOutOfRangeException("number");
    if(number > 25 && number <= 25 + 10) // Skip 0-9 (modified base 36)
        number += 10;
    if(number > 971 && number <= 971 + 360) // Skip 0a-09 (modified base 36)
        number += 360;
    if(number > 35027 && number <= 35027 + 12960) // Skip 0aa-099 (modified base 36)
        number += 12960;
    var stack = new Stack<char>();
    // Base 36 starting with letters rather than numbers
    const string characters = "abcdefghijklmnopqrstuvwxyz0123456789";
    while(number >= 0) {
        stack.Push(characters[number % 36]);
        number = number / 36 - 1;
    }
    return new String(stack.ToArray());
}

Results starting with 0:

a b c d e f g h i j k l m n o p q r s t u v w x y z
aa ab ac ad ae af ag ah ai aj aa ab ac ad ae af ag ah ai aj ak al am an ao
ap aq ar as at au av aw ax ay az a0 a1...
Charles Burns
  • 10,310
  • 7
  • 64
  • 81
  • 1
    Maybe you can add to the allowed characters a database is allowed to have. I notice you have followed the rule that identifiers cannot start with a number. Mentioned here http://stackoverflow.com/a/20004348/511438 is @, underscore, pound and dollar. – Valamas Sep 28 '15 at 04:18
  • True, that would be easy to do, though it may risk becoming less database-agnostic. – Charles Burns Sep 28 '15 at 04:28
  • Seems like your code produces colisions. Try 26 and 36. – Timur Mannapov Sep 28 '15 at 08:15
  • @TimurMannapov Yes there are collisions all over the place, and I have to take a shower every time I add an `if(range)...workaround` line. – Charles Burns Sep 28 '15 at 16:07
0

Code above produces collisions. Fixed code without collisions and magic numbers.

    public static String intToDatabaseIdentifier(int number)
    {
        const string abcFirst = "abcdefghijklmnopqrstuvwxyz";
        const string abcFull = "abcdefghijklmnopqrstuvwxyz0123456789";
        if (number < 0 || number > 1000000)
            throw new ArgumentOutOfRangeException("number");
        var stack = new Stack<char>();
        //Get first symbol. We will later reverse string. So last - will be first. 
        stack.Push(abcFirst[number % abcFirst.Length]);
        number = number / abcFirst.Length;
        //Collect remaining part
        while (number > 0)
        {
            int index = (number - 1) % abcFull.Length;
            stack.Push(abcFull[index]);
            number = (number - index) / abcFull.Length;
        }
        //Reversing to guarantee first non numeric.
        return new String(stack.Reverse().ToArray());
    }
Timur Mannapov
  • 217
  • 2
  • 9
0

Timur Mannapov's answer produces results similar to some of my other attempts (except his results don't have the problems noted in comments) in that the progression is not what one would expect, e.g. aa, ba, ca instead of aa, ab, ac: (call with String.Concat(ToParamName(i)))

// Starts with aa, ba, ba... instead of a, b, c. Probably wouldn't be hard
// to fix but I abandoned this method because it's annoying to call using
// string.Concat(...)
public static IEnumerable<char> ToParamName(int number) {
    const string characters = "abcdefghijklmnopqrstuvwxyz0123456789";
    yield return characters[number % 26];
    number = number / 26;
    do {
        yield return characters[number % 36];
        number = number / 36 - 1;
    } while(number >= 0);
}


// Starts with a, b, c...aa, ba, ba but has collisions starting around 960
public static IEnumerable<char> ToParamName(int number) {
    const string characters = "abcdefghijklmnopqrstuvwxyz0123456789";
    yield return characters[number % 26];
    number = number / 26;
    while(number > 0) {
        yield return characters[number % 36];
        number = number / 36 - 1;
    }
}

I prefer having the results returned in a more natural order like a..z, aa, ab, ac...a9 (hey, I didn't claim I was being purely practical), but I forgot to mention that in the original post. Timur's answer covers all the original requirements, so I'll mark it correct.

I'll +1 an answer that produces results as described.

Charles Burns
  • 10,310
  • 7
  • 64
  • 81