I have found plenty of tools for trimming reads in a fastq format, but are there any available for trimming already aligned reads?
-
1: http://www.biostars.org/ 2: what does trimming mean? I work with fasta and fastq files all the time, and I have no idea what you're asking. – flies May 04 '12 at 21:52
-
2Trimming means removing bases from the ends of the sequences based on specific criteria. It could be simply a set number from each end or it could be based on quality in the case of fastq. Trimmomatic and the FastX toolkit can do this for fastq, but I am looking for something to do it in a bam file. – JoshuaA May 08 '12 at 18:29
4 Answers
I would personally discourage trimming of reads after aligning your reads especially if the sequences you're trying to trim are adapter sequences.
The presence of these adapter sequences will prevent your reads from aligning properly to the genome (you'll get a much lower percentage of alignments that you should from my experience). Since your alignment is already inaccurate, it will be quite pointless to trim the sequences after alignment (garbage in, garbage out).
You'll be much better off trimming the fastq files before aligning them.

- 161
- 1
- 7
-
-
If the reads are already aligned with the reference genome, I don't understand why trimming would be a problem, especially if there is reason to believe that there is DNA (or RNA) damage due to processing. Can you elaborate? – BigHeadEd Aug 06 '21 at 23:43
Do you want the alignment to be informing the trimming protocol, or are you wanting to trim on things like quality values? One approach would be to simply convert back to FASTQ and then use any of the myriad of conventional trimming options available. You can do this with Picard:
http://picard.sourceforge.net/command-line-overview.shtml#SamToFastq

- 3,159
- 1
- 17
- 16
-
I would prefer if the alignment informed the trimming protocol. The reads I want to do this on are RNAseq reads, so split reads would have to be taken into account. I could write something to simply trim the reads and quality scores, but updating the alignment while taking into account the CIGAR string seems a bit tricky. – JoshuaA May 03 '12 at 21:00
One possibility would be use GATK toolset, for example ClipReads. If you want to remove adaptors, you can use ReadAdaptorTrimmer. No back converting to fastq needed(Documantation : http://www.broadinstitute.org/gatk/gatkdocs/).
Picard is, off course, another possibility.
The scenario of trimming reads in bam file would be encountered when you want to normalize the reads to the same length after you have performed a tremendous alignment works. Remapping after trimming the fastq reads is not energy efficient. In site reads trimming from bam file will be a prefer solution.
Please have a try bbmap/reformat.sh, which can trim the reads with input file accepting bam format.
reformat.sh in=test.bam out=test_trim.bam allowidenticalnames=t overwrite=true forcetrimright=74 sam=1.4
## the default output format of reformat is sam 1.4. however, many tools only recognize 1.3 version. So the following step is to convert the 1.4 to version 1.3.
reformat.sh in=test_trim.bam out=test_trim_1.3.bam allowidenticalnames=t overwrite=true sam=1.3

- 732
- 2
- 14
- 26