I have a SAM file with an RX: field containing 12 bases separated in the middle by a -
i.e. RX:Z:CTGTGC-TCGTAA
I want to remove the hyphen from this field, but I can't simply remove all hyphens from the whole file as the read names contain them, like 1713704_EP0004-T
Have mostly been trying tr,
but this is just removing all hyphens from the file.:
tr -d '"-' < sample.fq.unaln.umi.sam > sample.fq.unaln.umi.re.sam
input is a large SAM file of >10,000,000 lines like this:
1902336-103-016_C1D1_1E-T:34 99 chr1 131341 36 146M = 131376 182 GGACAGGGAGTGTTGACCCTGGGCGGCCCCCTGGAGCCACCTGCCCTGAAAGCCCAGGGCCCGCAACCCCACACACTTTGGGGCTGGTGGAACCTGGTAAAAGCTCACCTCCCACCATGGAGGAGGAGCCCTGGGCCCCTCAGGGG NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN MC:Z:147M MD:Z:83T62cD:i:4 cE:f:0 PG:Z:bwa RG:Z:A MI:Z:34 NM:i:1 cM:i:3 MQ:i:36 UQ:i:45 AS:i:141 XS:i:136 RX:Z:CTGTGC-TCGTAA
Desired output (i.e. last field)
1902336-103-016_C1D1_1E-T:34 99 chr1 131341 36 146M = 131376 182 GGACAGGGAGTGTTGACCCTGGGCGGCCCCCTGGAGCCACCTGCCCTGAAAGCCCAGGGCCCGCAACCCCACACACTTTGGGGCTGGTGGAACCTGGTAAAAGCTCACCTCCCACCATGGAGGAGGAGCCCTGGGCCCCTCAGGGG NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN MC:Z:147M MD:Z:83T62cD:i:4 cE:f:0 PG:Z:bwa RG:Z:A MI:Z:34 NM:i:1 cM:i:3 MQ:i:36 UQ:i:45 AS:i:141 XS:i:136 RX:Z:CTGTGCTCGTAA
How do I solve this problem?