0

In reference to this question regarding column to row transposition using awk I am baffled as to why Ive been unable to display the last column in a row and I suspect the CR from the end of the row is being added to the array to print.

As this appears to be the case as per the linked question on DOS endings how do I remove the CR from the last field in the context of the awk script presently being used (without running something like dos2unix):

{ 
         if (!keys[$3]++) { b[++c] = $3; row1 = row1 OFS $3; row2 = row2 OFS $4 }
         line = groups[$1][$3];
         groups[$1][$3] = (line == ""? $6$7: line OFS $6$7) 
     }
     END{ 
         print row1 ORS row2; 
         for (i in groups) {
             r = i; 
             for (j in b) r = r OFS groups[i][b[j]];
             print r 
         } 
     }

The column with suspected CR is $7, eg

22405   XRJ27   IL17C       rs4673      C______2038_20  N   N
22405   XRJ27   CRP     rs2794520   C____177486_10  T   T
22405   XRJ27   G6PC2       rs560887    C____323082_10  C   C
22405   XRJ27   TCN2        rs1801198   C____325467_10  G   G
22405   XRJ27   SLC30A8     rs13266634  C____357888_10  C   C
22405   XRJ27   COL5A1      rs12722     C____370252_20  C   C
22405   XRJ27   LEPR        rs1137100   C____518168_20  A   G
gungu
  • 161
  • 1
  • 1
  • 9
  • 2
    Possible duplicate of [Why does my tool output overwrite itself and how do I fix it?](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it) – Sundeep Apr 09 '18 at 10:30
  • confirmed that it is the DOS ending issue and it can be addressed by a tool such as dos2unix - however it would be useful if this could be done in awk – gungu Apr 09 '18 at 10:50
  • 1
    the linked question does show using `sub` as well within awk... try it... – Sundeep Apr 09 '18 at 11:11
  • 1
    Perhaps by specifying on the command line or BEGIN section that RS is '\r\n'? – ReluctantBIOSGuy Apr 09 '18 at 12:15
  • @ReluctantBIOSGuy great idea! I would even go for `\r?\n` in case the file is the result of copy pasting from several unix/dos files. So some might have the `\r` and others not. – kvantour Apr 09 '18 at 12:40
  • 1
    @ReluctantBIOSGuy good point.. however that'll require `gawk`.. this is explained as well in the linked question.. – Sundeep Apr 09 '18 at 12:45
  • The OP is already using gawk, see her gawk-only array syntax (`groups[$1][$3]`) – Ed Morton Apr 09 '18 at 15:50
  • 2
    @kvantour files that use `\r\n` (carriage-return linefeed) for the newline often have `\n` (linefeed) alone in the middle of fields. If you use `\r?\n` as the RS value then you can't separate the mid-field linefeeds from the linefeeds that are part of the newline so YMMV with using that and it'd be gawk-only (which the OP does just happen to be using). See https://stackoverflow.com/q/45420535/1745001. – Ed Morton Apr 09 '18 at 16:01
  • 1
    @EdMorton Thanks a lot. for some reason I never realised that `RS` in posix only picks the first character and is not a BRE. Apparently to much gnu on my side ;-) – kvantour Apr 10 '18 at 08:18

0 Answers0