How I can sort a text file for specific string?

Question

I have a text file with the following lines:

 Ca4  0.500001 0.000000 0.000000
 C4   0.750001 0.500000 0.000000
 O10  0.750001 0.243180 0.000000
 O8   0.652432 0.628410 -0.779621
 O12  0.847569 0.628410 0.779621
 Ca3  0.120090 0.500000 -3.035668
 C3   0.370090 0.000000 -3.035668
 O9   0.370090 -0.256820 -3.035668
 O7   0.272522 0.128410 -3.815289
 O11  0.467659 0.128410 -2.256048
 Ca1  0.000000 0.000000 0.000000
 C2   0.250000 0.500000 0.000000
 O4   0.250000 0.756820 0.000000
 O6   0.152432 0.371590 -0.779621
 O2   0.347569 0.371590 0.779621
 Ca2  0.620091 0.500000 -3.035668
 C1   0.870091 0.000000 -3.035668
 O3   0.870091 0.256820 -3.035668
 O5   0.772522 -0.128410 -3.815289
 O1   0.967660 -0.128410 -2.256048

What I want to do is simply order the lines so that "Ca" (string) lines go first and the rest of the lines keep as is.

I tried using

 grep "Ca" file | sort

but it prints only in the screen the lines containing "Ca"

Any suggestions?

kojiro · Accepted Answer · 2015-09-03T17:10:38.197

2

You pretty much have to do two filters. You can sorta avoid having to open the file twice explicitly by using tee:

< file tee >(grep ^Ca > ca) | grep -v ^Ca > noca
cat ca noca > newfile

If you want to internally sort the Ca part:

< file tee >(grep ^Ca | sort > ca) | grep -v ^Ca > noca
cat ca noca > newfile

If it's really important to you not to open the file twice, you can use awk:

awk '/^Ca/{ print }
     !/^Ca/{ na[NR]=$0; }
     END{ for(ln in na) print na[ln]; }' file

but this approach can use a lot of memory as it keeps the non-Ca parts until the end of processing.

edited Sep 03 '15 at 17:10

answered Sep 03 '15 at 16:46

kojiro

74,557
19
143
201

That´s right!! Many thank's kojiro. I don't need sort the Ca lines at the moment but is good to know. – git Sep 03 '15 at 16:54
Anyone have a thing using just one input file?? – git Sep 03 '15 at 17:03
@git ok, updated to only open the file once, using awk. – kojiro Sep 03 '15 at 17:10
`awk '{ print NR + ($1 ~ /Ca/ ? 9000000 : 1000000) "\t" $0 }' file | sort | cut -f2- >new` but it's probably less efficient. – tripleee Sep 03 '15 at 17:13
Ok I understood, probably is better use kojiro option. Many thanks – git Sep 03 '15 at 17:14
I was thinking something like ed command to move lines but using a string but maybe it can not be done this way... – git Sep 03 '15 at 17:17

cerkiewny · Answer 2 · 2015-09-03T17:10:39.903

1

grep "Ca" file | sort;  grep -v  "Ca" file | sort

Will do what you need, first it will only output the sorted lines containing the "Ca" then it will print the remaining not containing the "Ca" note the "-v" parameter in grep meaning negative matching.

Also if you need the output to be in one stream you can connect the outputs using { && } syntax the command would look like this:

{ grep "Ca" file | sort &&  grep -v  "Ca" file | sort; }

edited Sep 03 '15 at 17:10

answered Sep 03 '15 at 16:47

cerkiewny

2,761
18
36

That still doesn't make any sense. The usual statement delimiter is newline or semicolon. Using `&&` implies that the second command is conditional on the success of the first. – tripleee Sep 03 '15 at 17:00
It doesnt work sorry, although the file does not print, I don't want to order all lines – git Sep 03 '15 at 17:02
Then take out the `| sort`s. – tripleee Sep 03 '15 at 17:06

karakfa · Answer 3 · 2015-09-03T19:33:00.750

0

Here is an alternative solution

 nl -n rz ca | awk -vOFS="\t" '/Ca/{$1="#"$2} {$1=$1}1' | sort -k1,1 | cut -f2-

to simplify the file is now tab separated.

Explanation: number the lines to preserve order of other rows, change the line number to key for the rows to be sorted; sort and discard the key.

edited Sep 03 '15 at 19:33

answered Sep 03 '15 at 18:53

karakfa

66,216
7
41
56

How I can sort a text file for specific string?

3 Answers3