0

I have file where I want to update the values in the first field of a specific column (say 1 and 2) when there is a context (pipe i.e |) in the 5th field of that column.

I can use python but splitting the lines, substituting the values and joining them is going to be a long script. I am looking for a solution using awk (pefereable) else others are fine too that are short. Also I want to embed this within python script.

Below are two columns from my data with fields within column separate by (:).

0/1:42,19:61:99:0|1:5185_T_TTCTATC:560,0,1648       0/1:38,34:72:99:0|1:5185_T_TTCTATC:1145,0,1311

0/0:124,0,0:124:99:0,120,1800,120,1800,1800    0/0:165,0,0:165:99:0,120,1800,120,1800,1800

0/0:152,0:152:99:.:.:0,120,1800    0/1:145,34:179:99:0|1:5398_A_G:973,0,6088

So, when the 5th field in that column has '|' we update first field with 5th field value.

Expected result:

0|1:42,19:61:99:0|1:5185_T_TTCTATC:560,0,1648       0|1:38,34:72:99:0|1:5185_T_TTCTATC:1145,0,1311

0/0:124,0,0:124:99:0,120,1800,120,1800,1800    0/0:165,0,0:165:99:0,120,1800,120,1800,1800

0/0:152,0:152:99:.:.:0,120,1800    0|1:145,34:179:99:0|1:5398_A_G:973,0,6088

-Actually, there are lots of column. And, say this kind of column appear after 5th python index position, and I want to do the substitution in every column field after the 5th column, how can I approach the problem.

Thanks,

Thanks,

everestial007
  • 6,665
  • 7
  • 32
  • 72
  • Can you please provide an awk answer though? Embedding it in python is just another problem and I will deal with it later. Thanks ! – everestial007 Dec 27 '16 at 23:32
  • @EdMorton: Could you please provide the awk solution? – everestial007 Dec 27 '16 at 23:42
  • Hi @EdMorton: I hope someone will provide an answer asap. But, sometimes I have to wait couple of hours, and from tomorrow nobody is going to look into it again. If you can provide an awk solution, I can at least proceed with doing what I want with my files now. I can wait another day for a more complete answer. Thanks ! – everestial007 Dec 27 '16 at 23:45

1 Answers1

1
$ awk '{ for (i=1;i<=NF;i++) { split($i,f,/:/); if (f[5]~/\|/) sub(/^[^:]+/,f[5],$i) } }1' file
0|1:42,19:61:99:0|1:5185_T_TTCTATC:560,0,1648 0|1:38,34:72:99:0|1:5185_T_TTCTATC:1145,0,1311
0/0:124,0,0:124:99:0,120,1800,120,1800,1800    0/0:165,0,0:165:99:0,120,1800,120,1800,1800
0/0:152,0:152:99:.:.:0,120,1800 0|1:145,34:179:99:0|1:5398_A_G:973,0,6088

The only caveat is that the 5th subfield can't contain &s since that would be a backreference metacharacter in the sub().

If you want to start the replacements at column 5, change i=1 to i=5 in the loop init part.

broken into lines:

$ awk '{
    for (i=1;i<=NF;i++) {
        split($i,f,/:/)
        if (f[5]~/\|/)
            sub(/^[^:]+/,f[5],$i)
    }
}1' file
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Hi@Ed Morton: This worked well for substitution. But, in certain lines the tab separator between columns changed to spaces. All lines actually have same structure. I am trying to see why that happened, no clue though. – everestial007 Dec 28 '16 at 00:32
  • 1
    You never mentioned your columns were tab separated. What separates columns is just as important to tell us about as what the columns contain when you're asking a question. Change `'{` to `'BEGIN{FS=OFS="\t"}' {` so awk knows to use tabs as the separator. – Ed Morton Dec 28 '16 at 00:41
  • Hi @EdMorton: This `awk` actually save so much of my time and run time has decreased considerably. Could you add some explanation to the answer so I can have a more descriptive understanding of it. I am also looking at the awk tutorial just to teach me something. Thanks again ! – everestial007 Dec 28 '16 at 23:06
  • 1
    Honestly I think it's very obvious what it's doing but I've split it onto separate lines at the end of my answer to improve the clarity. take a look and let me know if there's any part of it you don't understand. Wrt an awk tutorial - most online tutorials are complete nonsense, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins. – Ed Morton Dec 28 '16 at 23:47