-2

I have these lines in my text files appearing

>SCRT2_DBD_NNGCAACAGGTGN
0.455331585111  0.0458438972816 0.145508011584  0.353316506023
0.173692317806  0.0247846149283 0.759302422526  0.0422206447403
1.16863332073e-07       0.940983666713  1.16863332073e-07       0.0590160995601
0.00506737765087        7.91765386614e-08       0.988123281671  0.00680926150142
0.0623177863824 0.93243216705   0.000777853090471       0.00447219347766
0.00453077729507        0.995469025719  9.8493017436e-08        9.8493017436e-08
0.507583592195  0.453364643178  0.0180440139317 0.0210077506946
>SNAI2_DBD_NRCAGGTGN
0.455331585111  0.0458438972816 0.145508011584  0.353316506023
0.173692317806  0.0247846149283 0.759302422526  0.0422206447403
>SP1_DBD_GCCMCGCCCMC
0.455331585111  0.0458438972816 0.145508011584  0.353316506023
0.173692317806  0.0247846149283 0.759302422526  0.0422206447403
1.16863332073e-07       0.940983666713  1.16863332073e-07       0.0590160995601
0.00506737765087        7.91765386614e-08       0.988123281671  0.00680926150142
0.0623177863824 0.93243216705   0.000777853090471       0.00447219347766
0.00453077729507        0.995469025719  9.8493017436e-08        9.8493017436e-08
0.507583592195  0.453364643178  0.0180440139317 0.0210077506946

And I want to get this:

>M_SCRT2
0.455331585111  0.0458438972816 0.145508011584  0.353316506023
0.173692317806  0.0247846149283 0.759302422526  0.0422206447403
1.16863332073e-07       0.940983666713  1.16863332073e-07       0.0590160995601
0.00506737765087        7.91765386614e-08       0.988123281671  0.00680926150142
0.0623177863824 0.93243216705   0.000777853090471       0.00447219347766
0.00453077729507        0.995469025719  9.8493017436e-08        9.8493017436e-08
0.507583592195  0.453364643178  0.0180440139317 0.0210077506946
>M_SNAI2
0.455331585111  0.0458438972816 0.145508011584  0.353316506023
0.173692317806  0.0247846149283 0.759302422526  0.0422206447403
>M_SP1
0.455331585111  0.0458438972816 0.145508011584  0.353316506023
0.173692317806  0.0247846149283 0.759302422526  0.0422206447403
1.16863332073e-07       0.940983666713  1.16863332073e-07       0.0590160995601
0.00506737765087        7.91765386614e-08       0.988123281671  0.00680926150142
0.0623177863824 0.93243216705   0.000777853090471       0.00447219347766
0.00453077729507        0.995469025719  9.8493017436e-08        9.8493017436e-08
0.507583592195  0.453364643178  0.0180440139317 0.0210077506946

I don't want to do it manually, as these are too many.

Please help with aone liner in awk or perl.

ysth
  • 96,171
  • 6
  • 121
  • 214
Angelo
  • 4,829
  • 7
  • 35
  • 56
  • 1
    Here's a one-liner that might help: [`perldoc perlfaq5`](https://metacpan.org/pod/perlfaq5#How-do-I-change-delete-or-insert-a-line-in-a-file-or-append-to-the-beginning-of-a-file) – ThisSuitIsBlackNot Sep 09 '14 at 16:57

3 Answers3

3

Using awk:

$ awk -F"[>_]" '/^>/{ print ">M_" $2; next }1' file
>M_SCRT2
>M_SNAI2
>M_SP1
>M_SP3

Using perl:

$ perl -F"[>_]" -lane 'print /^>/ ? ">M_$F[1]" : $_' file
>M_SCRT2
>M_SNAI2
>M_SP1
>M_SP3
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
2

Alternatives:

perl -pe 's/>(.*?)_.*/>M_$1/'
perl -pe 's/_.*//;s/>/>M_/'

or another sed

sed 's/_.*//;s/>/>M_/'
clt60
  • 62,119
  • 17
  • 107
  • 194
1

You could try the below sed command,

sed 's/^>\([^_]*\).*$/>M_\1/' file

Example:

$ sed 's/^>\([^_]*\).*$/>M_\1/' file
>M_SCRT2
>M_SNAI2
>M_SP1
>M_SP3
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Hi Avinash, Thing is these lines appear randomly in the text file and are not one after the other. So please check the edited sample file – Angelo Sep 09 '14 at 17:00
  • sure, where is the sample? – Avinash Raj Sep 09 '14 at 17:01
  • @Angelo it would do the replacement only on the lines which starts with `>`. So the above command would for the updated sample. – Avinash Raj Sep 09 '14 at 17:04
  • Yeah, I see, I don't know why. But on my original data it is not working. I will try to fix it myself. Many Thanks. – Angelo Sep 09 '14 at 17:07
  • 1
    @Angelo to save the changes made to that file, you need to add inline edit parameter `-i` in the sed command, `sed -i 's/^>\([^_]*\).*$/>M_\1/' file` – Avinash Raj Sep 09 '14 at 17:08