1

I'm sorting a list of usernames. When the letters are lowercase, the sort command works as expected.

Expected and actual output for lowercase:

n
n_123
na
na_123

When the characters are uppercase and followed by an underscore, things get weird.

Expected output for uppercase:

N
N_123
NA
NA_123

Actual output for uppercase using sort:

N
NA
NA_123
N_123

I thought I'd be able to solve this using

env LC_COLLATE=C sort $file

but no dice.

Actual output using env LC_COLLATE=C sort:

N
NA
NA_123
N_123

I'm running GNU bash, version 4.4.12(1)-release (x86_64-apple-darwin16.3.0) on Mac OS X 10.12.3

Any help would be much appreciated.

Joshua
  • 40,822
  • 8
  • 72
  • 132
A-K
  • 147
  • 1
  • 1
  • 9
  • Thanks for the input. I use homebrew and my sort version is sort (GNU coreutils) 5.93. – A-K Mar 06 '17 at 18:33
  • 2
    BTW, you don't need `env` -- `LC_COLLATE=C sort` will have the shell export `LC_COLLATE` with the value `C` only for the duration of the `sort` command as it is. – Charles Duffy Mar 06 '17 at 18:35

1 Answers1

5

Underscore is ASCII 95 and that comes after all the uppercase letters (A-Z) i.e. 65-90. So in sorting uppercase letters will always come before _.

If you want to delimit at _ then you can use -t _ to get your expected output:

sort -t _ -k1,1 file
N
N_123
NA
NA_123

Reason why your sort command worked with lowercase letters is because lowercase letters come after _ i.e. 97-122

anubhava
  • 761,203
  • 64
  • 569
  • 643