14

a file with lines (aa ac a-b) unix sorts to (aa a-b ac) instead of the expected (a-b aa ac)

it is as if sort is ignoring the '-' character.

interestingly, a dash by itself is sorted correctly (a b c -) sorts to (- a b c).

why? anyway to change this behavior?

2 Answers2

18

The sort order behaviour of sort(1) is controlled by your locale settings (see man locale).

There are a number of different locale settings, for example:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
...
LC_ALL=

To choose the desired sort behaviour, you need to choose the correct LC_COLLATE value. In this case, the standard built in C (POSIX) locale is suitable:

$ sort testcase
aa
a-b
ac

$ LC_COLLATE=C sort testcase
a-b
aa
ac

If you prefer, you can set all the locale settings (thus being more consistent) by setting LC_ALL=C. Since these are environment variables, you can permanently set your sort order, with export LC_ALL=C or similar, in your shell startup script.

Lockie
  • 886
  • 5
  • 8
2

setting environment variable LC_ALL=C changes the behavior of sort. The default locale sort order must be treating '-' specially.