3

I'm wondering if Microsoft SQL Server supports sorting based on localization preferences. I see they have a page at https://technet.microsoft.com/en-us/library/cc835499(v=sql.110).aspx which lists how strings are compared based on the norms of the language. But I notice that the English and Dutch comparison both use Latin1_General_CI_AS.

The problem is in the Netherlands many have the last name with a tussenvoegsel. So I believe the last name of De Jong would come after Van Beethoven when sorting in ascending order.

My understanding is the Microsoft SQL Server does not support this use case but I am not 100% sure. Wondering if anyone else had a different experience. Thanks!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
work_account
  • 75
  • 1
  • 1
  • 4
  • Just to clarify, you would expect it to come *after* when sorting *ascending*? Even with case sensitivity [**I would expect the opposite**](http://rextester.com/OHZB92513). – S3S Aug 20 '18 at 18:34
  • 1
    In Dutch, the surname prefix (de, van, ...) has a lower sorting precedence than Jong, Beethoven, etc. Marc, are you asking if SQL is aware of these differences? I think the answer is no. Sterker nog, I think it is impossible if your database has names from mixed backgrounds. How would you sort Pablo Picasso's full name, "Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso"? – user1443098 Aug 20 '18 at 18:55
  • I'm guessing by sort preference one would expect De and Van to be ignored completely... this would make more sense and what @user1443098 elaborated on is a great point. It's similar to normalizing names, though i can see in Dutch parsing out or keeping this separate may be difficult. – S3S Aug 20 '18 at 18:58
  • 1
    Good day, The answer is yes, SQL Server supports localized sorting. It seems like you are confusing `linguistic sorting` which is totally different thing, with `alphanumeric sorting` by language. The language in Netherlands is Dutch and D comes before V therefore `alphanumeric sorting` (simple "order by") will be according to this. SQL Server sort `alphanumeric` according to the COLLATE and the fitting COLLATE for Dutch (Netherlands) is Latin1_General_CI_AS or Latin1_General_CS_AS. For `linguistic sorting` you can use `FUll Text Search` Technics. – Ronen Ariely Aug 20 '18 at 19:14
  • 1
    @scsimon yes I'd expect 'Van Beethoven' to come before 'De Jong', the surname prefix has lower precedence. user1443098 yes that is what I am asking, yeah it seems like it isn't supported, I'm looking around for libraries that might implement this outside of SQL as well. scsimon yes I expect De and Van to be ignored completely. – work_account Aug 20 '18 at 20:15
  • @RonenAriely yes I believe you are correct I may be confusing linguistic sorting and alphanumeric sorting. I'll try and see if I can find an example of `linguistic sorting` this might lead me in the right direction. – work_account Aug 20 '18 at 20:20
  • Given the complexities of all sorts of cultures, not to mention the possibilities of family-name-first or family-name-last, perhaps you are better with an additional column name_sort on which you do your sorting. Sometimes the name order is not even rule-based, its just personal preference based. – TomC Aug 21 '18 at 00:56
  • (sligthly OT) Maybe it's because I'm Flemish (close relative to Dutch =) but I've never seen a list of names sorted the way you're proposing. Van Beethoven would always show up (way) after De Jong. Doing some googling around it seems that .nl sites indeed sort different vs .be sites: funny example on wikipedia: https://nl.wikipedia.org/wiki/Lijst_van_Vlaamse_artiesten (Luc Van Acker belongs to [V]) versus https://nl.wikipedia.org/wiki/Lijst_van_Nederlandse_artiesten (Bill van Dijk belongs to [D]). Strange how these things can differ, I was totally unaware. – deroby Oct 04 '18 at 17:10

1 Answers1

0

There is no way to tell SQL Server that certain parts of a text are to be treated different from other parts.

SQL Server knows and understands character-based sort orders (collations), e.g. é should be sorted as if it were e. But never word- and/or position-based, like Van should be ignored if it is followed by something.

Prefixes can have more than 1 word, e.g. "Van der" and "Van 't" are also quite common, and that is not all by far. So doing it 'right' by some kind of built-in rule would be a very complicated job.

Your only option to do this would be to store the prefix ("Tussenvoegsel") in a separate column.

Wikipedia mentions that using a separate 'Tussenvoegsel' column is commonly done in The Netherlands - while at the same time it mentions that this is not commonly done in the Dutch-speaking parts of Belgium, so you need even be aware of your audience.

Peter B
  • 22,460
  • 5
  • 32
  • 69