1

I'm trying to extract a list of unique tags from a tagged-text file. Tags are delimited by angle brackets, and each tag name starts with a colon: <:ttx>, <ol_2> and so on.

I started by adding a line-break after each >, then tried sort. The results baffled me, until I realized that sort was ignoring the first two characters.

Is there a switch I need to add, or is my Bbuntu-flavoured bash going for sort -d without the option?

unwind
  • 391,730
  • 64
  • 469
  • 606
  • 1
    Well, the -d option is _designed_ to ignore non alpha characters. What is the problem exactly? – fge Jan 13 '12 at 10:30
  • 6
    You should include the command, some sample input and output, and what you expected it to produce. – l0b0 Jan 13 '12 at 10:35
  • 1
    Whenever you see 'weird' behavior from sort the first thing to check is your locale settings and how they might influence things. – sorpigal Jan 13 '12 at 12:35
  • 1
    why `` doesn't start with a colon? Is `` a tag? – kev Jan 13 '12 at 13:17

1 Answers1

11

use LANG=C to disable your locale => sort usually works better:

grep -o '<:[A-Za-z0-9]>' your-tagged-text-file | LANG=C sort
oHo
  • 51,447
  • 27
  • 165
  • 200