-4

Strings which begin with number should be ordered first, then strings (which may or may not contain numbers next)

Here's sample data :

1000 apples
500 apples
10 apples
10 bananas
Container 100
Box 10
Container 1000
Container 123
Container xyz

desired ordering

10 apples
10 bananas
500 apples
1000 apples
Box 10
Container 100
Container 123
Container 1000
Container xyz
Thom A
  • 88,727
  • 11
  • 45
  • 75
BigIWT
  • 243
  • 1
  • 5
  • 14
  • 1
    @ThomA it seems the OP wants to order by number first no matter the location, then by text, which is why 1000 comes after 500 or 123 no matter where it appears. That's just bad design and *very* hard to fix – Panagiotis Kanavos Aug 31 '23 at 11:21
  • 1
    It would be a lot better to clean up the data *before* inserting it in the database. Strings are strings, no matter what they contain, and are sorted using string rules. While one could split the string to extract and cast one part to a number, the question shows numbers both at the start *and* end of the string. Not only is it very hard to parse this using T-SQL, any queries will be very slow because they can't use indexes – Panagiotis Kanavos Aug 31 '23 at 11:23
  • 1
    Yep, I just noticed that they want `500` before `1000` and `123` before `1000`. Seems the correct solution is for them to fix their design, @PanagiotisKanavos . – Thom A Aug 31 '23 at 11:24
  • I hope you don't have any values like `10 boxes 50 apples`. – Thom A Aug 31 '23 at 11:24
  • 1
    It's not enough to use a regular expression to extract a number here - T-SQL doesn't have regular expressions anyway. In C# (or Python) you could use `\d+` to find the first number and extract it to a separate field, as long as there aren't mulutiple number, eg `1000 CDs Mambo No 9`. In general, cleaning up and recognizing text parts is the job of text recognizers like Python's NTLK and Spacy. The overall category is called Named Entity Recognition – Panagiotis Kanavos Aug 31 '23 at 11:28
  • 1
    I’m voting to close this question because there is no question. – HABO Aug 31 '23 at 17:15

1 Answers1

0

Something like this perhaps:

select *
from (
    VALUES  (N'1000 apples')
    ,   (N'500 apples')
    ,   (N'10 apples')
    ,   (N'10 bananas')
    ,   (N'Container 100')
    ,   (N'Box 10')
    ,   (N'Container 1000')
    ,   (N'Container 123')
    ,   (N'Container xyz')
    ,   (N'Container')
) t (col1)
cross apply (
    select  
        left(col1, NULLIF(charindex(' ', col1), 0) - 1) as leftWord
    ,   right(col1, NULLIF(charindex(' ', reverse(col1)), 0) - 1) as rightWord
    ) b
order by case when leftword not like '%[^0-9]%' then cast(leftWord as bigint) else 9999999999 end
,   case when rightword not like '%[^0-9]%' then cast(rightWord as bigint) else 9999999999 end
,   leftword
,   rightword
,   col1

This sorts from "both sides" of the word, if you don't want sort from right numbers, just remove the rightword case when.

siggemannen
  • 3,884
  • 2
  • 6
  • 24