0

I am trying to use bwa mem to align sequence reads to the hg19 reference but my sequences all have a UMI (Unique molecular Identifier). I used umitools like so:

umitools trim --end 5 input.fastq NNNNNN > output.fastq

This then properly appended my UMI sequence to the name line in the output.fastq file, but then when using bwa mem to align, I get the error that:

paired reads have different names: "someTitle:UMI_ATGCTC", "someTitle:UMI_CATTAT"

Is there a way to use both bwa mem and umitools together so this doesn't happen?

The Nightman
  • 5,609
  • 13
  • 41
  • 74

1 Answers1

0

So this doesn't entirely answer the question, but gets close. umitools does not work for paired end reads as is. What I did to get around this was trim off my UMI sequences (6bp on each side of the reads) and then aligned using the followign code:

sed -i~ '2~4s/^.\{6\}//' file

The address 2~4 means "start on line 2, repeat each 4 lines".

s means replace, ^ matches the line beginning, . matches any character, \{6\} specifies the length (a "quantifier"). The replacement string is empty (//).

-i~ replaces the file in place, leaving a backup with the ~ appended to the filename.

The Nightman
  • 5,609
  • 13
  • 41
  • 74