1

It looks like we are limited to four different data types when it comes down to sorting the columns in a row in a Cassandra table. The four types I can see are:

BytesType, AsciiType, UTF8Type, IntegerType

However, to sort properly in a given language, one uses strcoll(), which makes use of the locale and ends up sorting certain characters before or after others depending on the language.

For example, in the French language you have accents on the e character that are sorted as following:

... d e é ê è ë f ...

I would imagine that the UTF8Type is not going to make that function work as expected for a French speaker.

Is the only way to get that to work, to actually implement our own sort in Cassandra? (Argh, I don't like Java...)

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156

1 Answers1

0

You can always set the locale to a constant one so you always get the same results. Alternatively, you could sort it by Unicode number, not Java's locale-aware algorithm.

Laserbeak
  • 78
  • 5
  • As far as I know, Unicode would be the same as UTF8 which is already offered... but that would not sort using language specific rules. Also I cannot fix the locale since my websites are to support many different languages, not just French... (although French and English would work fine with the French locale, German and Spanish have different rules for umlaut and acute accents...) – Alexis Wilke Oct 18 '15 at 04:55
  • Sorry, I guess I misunderstood your post. So you want to get different results based on locale? I was under the impression you wanted consistent results. – Laserbeak Oct 18 '15 at 14:27
  • I want the correct result, depending on a locale specific to that table. In other words, I want one table for English, one for French, one for German and one for Spanish (for example, all languages could be represented.) And the resulting sorts should match that locale in that table. That way I can have indexes that are human language specific. – Alexis Wilke Oct 18 '15 at 20:45