Question:
I need to sort a vector of character strings on my linux (ubuntu) machine in the same way as it is sorted on another windows machine with Windows-1252 locale collate.
On windows it works like this:
> order(c("A", "a", "0", "_", "/"))
[1] 5 4 3 2 1
On ubuntu 20.4 it works like this:
> order(c("A", "a", "0", "_", "/"))
[1] 4 5 3 2 1
Further info: The locale on the windows machine is as follows:
> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
What I've already try is to set the locale to en_US.CP1252 on the linux machine, but with no change in the sorting:
> Sys.getlocale()
[1] "LC_CTYPE=en_US.CP1252;LC_NUMERIC=C;LC_TIME=en_US.CP1252;LC_COLLATE=en_US.CP1252;LC_MONETARY=en_US.CP1252;LC_MESSAGES=en_US.CP1252;LC_PAPER=en_US.CP1252;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.CP1252;LC_IDENTIFICATION=C"
I've also tried to pass the desired collation to the str_order function, but with no luck:
> str_order(c("A", "a", "0", "_", "/"), locale = "en")
[1] 4 5 3 2 1
Is there any native way to enforce the collation when sorting? If not, how could I write my own sorter/comparator? Thank you!