1

Does T-SQL have any ability to do custom comparison of strings (for sorting), equivalent to .NET's IComparer?

Something like the ability to give Order By a user defined function that takes 2 strings and returns a value representing how they compare (greater than, less that, equal to)?

I currently have a C# ICompararer implementation that is being used to sort things in code, but now I need to produce this same sorted output from a stored proc.

For reference, this is the IComparer that I am trying to implement in TSQL.

public class SemanticComparer : IComparer<string>
{
    private static Regex _splitter = new Regex("\\W+");

    public int Compare(string x, string y)
    {
        string[] partsX = _splitter.Split(x);
        string[] partsY = _splitter.Split(y);

        int shortest = Math.Min(partsX.Length, partsY.Length);

        for (int index = 0; index < shortest; index++)
        {
            int intX, intY;
            int result;

            if (int.TryParse(partsX[index], out intX) && int.TryParse(partsY[index], out intY))
            {
                result = intX.CompareTo(intY);
            }
            else
            {
                result = string.Compare(partsX[index], partsY[index], StringComparison.Ordinal);
            }

            if (result != 0)
            {
                return result;
            }
        }

        return 0;
    }
}

It would need to be able to sort things that look like this (in order as they should be output):

  • 2-101.11(A)(B)
  • 9.1
  • 9.2
  • 9.2.1
  • 9.02.2
  • 9.02.3
  • 9.3
  • 9.3.1
  • 9.3.2
  • 10.
  • 11.
  • 11.1
  • 11.2.a
  • 11.2.b
  • 11.2.c
  • 11a.2.a
  • 11b.2.b
  • 21 CFR 110.10
  • 046.981(e)(m)

Where each non-word character splits the string in to segments, tries to compare them numerically if possible, and then as strings if not. There can be any "depth" to the number of segments.

CLR Stored Procedures are, unfortunately, not an option.

Bradley Uffner
  • 16,641
  • 3
  • 39
  • 76
  • Probably a `case` expression. – shawnt00 Jan 19 '17 at 18:01
  • Could you go in to some more detail on using `case`? I've done some searching, and I see how it could be used to some level of programmatic sorting, but I don't see how to use it to get the result I'm looking for. – Bradley Uffner Jan 19 '17 at 18:17
  • So you have two string values which you want to treat as vectors and sort them "alphabetically". What do your values look like? – shawnt00 Jan 19 '17 at 18:20
  • I just updated the question to include examples. – Bradley Uffner Jan 19 '17 at 18:23
  • What about SQL functions: are those ruled out too? – shawnt00 Jan 19 '17 at 18:33
  • Anything that would work on SQL Server 2008 r2 or lower is acceptable, other than using CLR. – Bradley Uffner Jan 19 '17 at 18:37
  • Sorry, that should have been "2008 r2 or **lower** " – Bradley Uffner Jan 19 '17 at 18:41
  • Why in the world is 9.02 after 9.2? That doesn't make sense. The rest of this is pretty simple. – Sean Lange Jan 19 '17 at 19:31
  • @SeanLange The 9.02.2 is after 9.2.1 because because the 2nd group is "2" when treated as integers, and the 3rd group is sorted numerically. Basically, "02" collapses down to "2". – Bradley Uffner Jan 19 '17 at 19:36
  • You can use [`ParseName`](https://msdn.microsoft.com/en-us/library/ms188006.aspx?f=255&MSPPError=-2147217396) if you won't have more than four levels (three dots) to separate the segments. A UDF can perform the numeric/string comparison. That just leaves ordering. – HABO Jan 19 '17 at 19:38
  • That just doesn't make a lot of sense. You kind of sometimes want to sort by the values as numbers and other times as characters. Ideally these values would be in different columns instead of slammed together like this. – Sean Lange Jan 19 '17 at 19:40
  • I would love to store them in separate columns, but these values represent references to paragraphs of legal code. They use all kinds of crazy formats and special characters across the documents. The individual rows are a flattened list from a hierarchical document on the client's end. – Bradley Uffner Jan 19 '17 at 19:42
  • @HABO Unfortunately they can go well beyond 3 dots, and they can use separators other than dots. A more complex example is `21 CFR 110-129`, that would be broken down in to "21" "CFR" "110" and "129". There are also values like `Apx J. 16` and `46.243(a)(c)(d)`. – Bradley Uffner Jan 19 '17 at 19:44
  • If I have to, I can resort to periodically sorting them in C#, then storing the sorted index in an additional field on the table, but I would rather avoid that if possible. I know this is kind of a crazy way to sort things, but this is what the customer wants. I didn't design the system; I just work here. – Bradley Uffner Jan 19 '17 at 19:50

1 Answers1

2
with data as (
    select c from (values
        ('9.1'), ('9.2'), ('9.2.1'), ('9.02.2'), ('9.02.3'), ('9.3'), ('9.3.1'), ('9.3.2'),
        ('10.'), ('11.'), ('11.1'), ('11.2.a'), ('11.2.b'), ('11.2.c'), ('11a.2.a'), ('11b.2.b')
    ) t(c)
)
select c, '[' +
    right('00000' +
        substring(c, 1, charindex('.', c + '.0.0', 1) - 1) +
            case when substring(c + '.0.0', charindex('.', c + '.0.0', 1) - 1, 1) between '0' and '9' then ' ' else '' end,
        6
    ) + '.' +
    right('00000' +
        substring(c, charindex('.', c + '.0.0', 1) + 1, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) - charindex('.', c + '.0.0', 1) - 1) +
            case when right('0' +
                substring(c, charindex('.', c + '.0.0', 1) + 1, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) - charindex('.', c + '.0.0', 1) - 1),
                1
            ) between '0' and '9' then ' ' else '' end,
        6
    ) + '.' +
    right('00000' +
        substring(c, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) + 1, 10) +
            case when right('0' +
                substring(c, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) + 1, 10),
                1
            ) between '0' and '9' then ' ' else '' end,
        6
    ) + ']'
from data
order by 2;

I threw this together for fun. As you can see, string parsing in SQL is usually a lot of work and has a lot of repeated subexpressions.

The idea here is that it appears you can transform your values in a normalized format that is sortable via a regular alphabetic sort. The front numeric portion is zero-padded. A single alphabetic character is allowed at the end of each "field", otherwise a space is appended. As one example 11.2.a becomes [00011 .00002 .00000a]

It's fairly flexible and you could certainly wrap up this logic in a scalar function. I'll let you decide whether this whole thing is a good idea.

http://rextester.com/KZX77690

http://rextester.com/LYL6977 (Slightly improved?)

shawnt00
  • 16,443
  • 3
  • 17
  • 22