2

I have written a shell script that gets all the file names from a folder, and all its sub-folders, and copies them to the clipboard after sorting (removing all paths; I just need a simple file list of the thousands of randomly named files within).

What I can’t figure out is how to get the SORT command to sort properly. Meaning, the way a spreadsheet would sort things. Or the way your Mac finder sorts things.

Underscores > numbers > letters (regardless of case)

Anyone know how to do this? Sort -n only works for files starting with numbers, sort -f was close but separated the lower case and capitals in a weird way, and anything starting with a number was all over the place. Sort -V was the closest, but anything started with an underscore went to the bottom instead of the top… I’m about to lose my mind.

I’ve been trying to figure this out for a week, and no combination of anything I have tried gets the sort command to actually, ya know, sort properly.

Help?

DasKraut
  • 123
  • 9
  • Please [edit] your question to include a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) with concise, testable sample input, expected output and your attempt to solve the problem yourself so we can help you further. See [ask] and look at existing questions that have been upvoted and answered for examples. – Ed Morton May 25 '22 at 13:18
  • FWIW when I fill in 3 cells in Excel with the values `a`, `1`, and `_` then do `Date->Sort->OK` it sorts them in the order `1`, `_`, `a`, not `_`, `1`, `a` as you suggest in your question it would (`Underscores > numbers > letters`). – Ed Morton May 25 '22 at 13:28
  • Yeah, I had actually done that on my original post for this question, but they locked it for being "off topic" for some reason, and my appeal went unanswered. I think the reason your spreadsheet sorted that way is because you led with "date." Try copying and pasting this list into a Google Sheet and just hit the regular Sort A-Z button: S_creen Shot 123.png, 2_Screen Shot ABC.png, _3_Screen Shot GHI.png, 2_Screen Shot DEF.png, screen Shot 456.png, _ref.txt, 220517_Screen.png, Screen Shot 123.png – DasKraut May 25 '22 at 18:17
  • Here was my original post: [original post](https://stackoverflow.com/questions/72296765/terminal-how-to-get-sort-to-sort-correctly) – DasKraut May 25 '22 at 18:19

2 Answers2

2

If I understand the problem correctly, you want the "natural sort order" as described in Natural sort order - Wikipedia, Sorting for Humans : Natural Sort Order, and macos - How does finder sort folders when they contain digits and characters?.

Using Linux sort(1) you need the -V (--version-sort) option for "natural" sort. You also need the -f (--ignore-case) option to disregard the case of letters. So, assuming that the file names are stored one-per-line in a file called files.txt you can produce a list (mostly) sorted in the way that you want with:

sort -Vf files.txt

However, sort -Vf sorts underscores after digits and letters on my system. I've tried using different locales (see How to set locale in the current terminal's session?), but with no success. I can't see a way to change this with sort options (but I may be missing something).

The characters . and ~ seem to consistently sort before numbers and letters with sort -V. A possible hack to work around the problem is to swap underscore with one of them, sort, and then swap again. For example:

tr '_~' '~_' <files.txt | LC_ALL=C sort -Vf |  tr '_~' '~_'

seems to do what you want on my system. I've explicitly set the locale for the sort command with LC_ALL=C ... so it should behave the same on other systems. (See Why doesn't sort sort the same on every machine?.)

pjh
  • 6,388
  • 2
  • 16
  • 17
  • You are an absolute legend! This has done it! I don't know the `tr` command. I will be looking into that immediately. Seriously, I was ready to switch to an Amish lifestyle this was driving me so mad. Really appreciate it! – DasKraut May 25 '22 at 17:30
0

It appears you want to sort in dictionary order and fold case, so it would be sort -df.

Armali
  • 18,255
  • 14
  • 57
  • 171
  • According to the man page you linked `-d Specify that only blank characters and alphanumeric characters, according to the current setting of LC_CTYPE, are significant in comparisons` which sounds like the sort would ignore underscores and digits which isn't what the OP wants. – Ed Morton May 25 '22 at 13:20
  • It does only sound _like the sort would ignore_ _digits_ if you don't consider them to be _`numeric`_. ;-) Regarding underscores, it's admittedly less clear. – Armali May 25 '22 at 13:30
  • Ah, I apparently read "alphanumeric" too quickly and missed the back half of it :-). In any case, it sounds like what gets sorted isn't the problem the OP is trying to solve, it's the order they get sorted in and so the solution may be a locale setting if any exist where Underscores > numbers > letters as the OP wants. – Ed Morton May 25 '22 at 13:36