bash sort / uniq -c: how to use tab instead of space as delimiter in output?

Question

I have a file strings.txt listing strings, which I am processing like this:

sort strings.txt | uniq -c | sort -n > uniq.counts

So the resulting file uniq.counts will list uniq strings sorted in the ascending order by their counts, so something like this:

 1 some string with    spaces
 5 some-other,string
25 most;frequent:string

Note that strings in strings.txt may contain spaces, commas, semicolons and other separators, except for the tab. How can I get uniq.counts to be in this format:

 1<tab>some string with    spaces
 5<tab>some-other,string
25<tab>most;frequent:string

This isn't really a question about `sort` (or, rather, changing the delimiter used by `sort` is both trivial and shown in the man page, as `-t` aka `--field-separator`; thus `sort -t $'\t'` would suffice to answer the whole of the question posed in the original title); the interesting part is how to change the delimiter used by `uniq -c` to a tab. — Charles Duffy, Jul 12 '16 at 19:26
As chepner briefly commented -- even with IFS at defaults, `while read -r count content; do ...` would succeed in parsing the count from the rest of the output in `uniq.counts` with the original output format, without need for a distinct character. — Charles Duffy, Jul 12 '16 at 19:31
Does this answer your question? [Why uniq -c output with space instead of \t?](https://stackoverflow.com/questions/11670393/why-uniq-c-output-with-space-instead-of-t) — Pablo Bianchi, Apr 01 '20 at 02:22

score 4 · Answer 1 · edited Apr 24 '19 at 14:48

4

You can do:

sort strings.txt | uniq -c | sort -n | sed -E 's/^ *//; s/ /\t/' > uniq.counts

sed will first remove all leading spaces at the beginning of the line (before counts) and then it will replace space after count to tab character.

edited Apr 24 '19 at 14:48

glicerico

690
4
20

answered Jul 12 '16 at 19:22

anubhava

761,203
64
569
643

score 3 · Accepted Answer · answered Jul 12 '16 at 19:21

You can simply pipe the output of the sort, etc to sed before writing to uniq.counts, e.g. add:

| sed -e 's/^\([0-9][0-9]*\)\(.*$\)/\1\t\2/' > uniq.counts

The full expression would be:

$ sort strings.txt | uniq -c | sort -n | \
sed -e 's/^\([0-9][0-9]*\)\(.*$\)/\1\t\2/' > uniq.counts

(line continuation included for clarity)

score 2 · Answer 3 · answered Jul 12 '16 at 19:21

With GNU sed:

sort strings.txt | uniq -c | sort -n | sed -r 's/([0-9]) /\1\t/' > uniq.counts

Output to uniq.counts:

 1      some string with    spaces
 5      some-other,string
25      most;frequent:string

If you want to edit your file "in place" use sed's option -i.

bash sort / uniq -c: how to use tab instead of space as delimiter in output?

3 Answers3