1

I have subscribed to a cloud drive service. All good until I needed to check that all folders in a long list had been actually uploaded. I realized that the name-based sort order in my local system is different than the one used by the remote cloud service or its web interface whatsoever. So I tried to figure out what is the remote sort order, in order to then use it also locally (before, I had tried to find configurations for the remote system to no avail). I am totally lost. So the question is:

What rules/locale sorts the following strings in this exact order?

T. J. Smith

T. Smith

T.J. Smith

Talya

T'Amya

Tamya

(I put Talya in there so to show that the sort order is "ascending" because in any (reasonable) sort order Talya comes before Tamya)

I have tried different ways to sort the list of strings in the hopes that one would match the cloud service's own order. This is what I tried

In Windows 10 (with my locale!) this list is sorted as:

enter image description here

In my Ubuntu Nautilus I get this:

enter image description here

And, finally, if I put those strings in a file (sortme.txt) and call "sort" from command line in Ubuntu I get the following:

(first LC_COLLATE=C)

enter image description here

(second, LC_COLLATE=en_US.UTF-8)

enter image description here

As you see no one of these match the desired ordering in particular no one matches the order of the strings "Talya", "T'Amya", "Tamya"

I would be very thankful if anybody could help me sort this out :-)

julio
  • 69
  • 7
  • Unicode has lots of things that look like dots and spaces. Can you confirm that all of the space characters in your example have codepoint 0x20, and that all of the dot characters have codepoint 0x2E? – Sneftel Dec 23 '22 at 13:18
  • File managers may or may not not sort file names lexicographically according to the locale collation order. Mine does either `T. J. Smith` `T. Smith` `T.J. Smith` `T'Amya` `Talya` `Tamya` or `T'Amya` `T. J. Smith` `T. Smith` `T.J. Smith` `Talya` `Tamya` depending on whether "alphabetical" or "natural" sort order is chosen. Everybody does it differently, there are no generally applicable rules. – n. m. could be an AI Dec 23 '22 at 13:42
  • 1
    @Sneftel I already renamed the folder names in this example in the remote system, so to reenter all dots and the apostrophe from my keyboard. Nothing changed. Maybe the spaces needed more attention, but what is driving me crazy now is the order of Talya T'Amya Tamya, which I could not reproduce with any of the systems I have tried – julio Dec 23 '22 at 15:35
  • @n. m. I have come to understand this, If you are saying that there are too many locales to check them all, that is exactly my point: I wish somebody could just sort this list in his own locale and find it is the same as that of my cloud drive. Besides, in Ubuntu 18.04, even if each locale could have its own different collatation rule, they actually ALL default to the same one. – julio Dec 23 '22 at 15:40
  • 1
    I suspect that this is not a locale at all, but an internal filtering operation being applied to the strings before comparison. For instance, `'` might be URL-encoded, or simply stripped. Figuring out the rules would require much more interactive experimentation. – Sneftel Dec 23 '22 at 15:47
  • These collation orders may or may not be determined by locales per se, as defined by C/C++/POSIX/whatever. They might be just application-private collation orders. By the way why do you ever need to replicate some unknown sorting rules? – n. m. could be an AI Dec 23 '22 at 16:01
  • @n. m. This is the problem I am facing: I have uploaded a long (hundreds) list of folders in bulk and I have had, and still have, unsignaled upload failure problems: some folders did not show up in the remote storage, with no warning. This forces me to double check the uploads, and the easier way I can think of is by visually comparing my local list of folders and the remote one. This would work decently enough if only some folders did not end up in a different position in the remote list, sometimes a few screens away. – julio Dec 23 '22 at 16:12
  • 1
    Why not just sort both lists locally? – n. m. could be an AI Dec 23 '22 at 16:15

1 Answers1

0

Not sure whether I should delete this question altogether. I realized that the sorting algorithm that I was trying to understand is just severely bugged: it has an outright wrong behavior where it puts half of al list in increasing order and the other half in decreasing order. So this order is probably just the result of a bug.

Anyway I resorted to use the cloud storage from another OS, with another interface, that just works as expected.

julio
  • 69
  • 7