I am working on assembling a transcriptome of the terminal ganglion of the cricket (Gryllus bimaculatus). We are working on a non-model species and thus did not remove ribosomal contamination during library prep. We want to remove ribosomal contamination bioinformatically, and I am mapping our sequences, which have been trimmed for quality control with fastqc and rcorrector, to the silva ribosomal database (https://www.arb-silva.de/). I uploaded the fasta silva database files onto the HPC, concatenated them, converted the U's-->T's, and then bowtie2 indexed the resulting fasta file. Now, we are having a problem when we want to map (bowtie2-align) our sequences to the silva database. Have you ever come across an error like this before? If so, how did you go about solving it?
I have tried many variations of this script with pretty much just troubleshooting the areas I put stars (**) around. For the part starred, I have also tried --sensitive-local, --fast-local, and --very-fast-local, with different combinations of threads and these moosehead commands:
qsub -l 10g=true -pe smp 16 silva_test9.sh
qsub -l gpu=1 -pe smp 16 silva_test10.sh
qsub -l 10g=true -pe smp 40 silva_test5.sh
I have also played with the number of threads (after "smp") when submitting the script to the HPC at our campus.
Within the script itself, I have also played with the number of --threads. In addition to using cpu as a variable, I have just put the number of threads there (12, 16, 40, etc). The maximum amount of RAM we can use is 360gb.
Most of the time I have been aborting the job after a couple of hours and even a few days sometimes. Output files are created in the correct location, but there is nothing in them. The error message I have been receiving is "bowtie2 died with signal 9 (KILL)".
#!/bin/bash
#$ -cwd
#$ -j y
#$ -S
#$ -pe smp 40
#$ -M mprasad@bowdoin.edu -m be
export silva_db=/mnt/research/hhorch/term_RNAseq_analysis/silva_db/silva_db/silva_db
export r1=/mnt/research/hhorch/term_RNAseq_analysis/trimmed_reads_cor_val/fixed_1C_1_R1.cor_val_1.fq
export r2=/mnt/research/hhorch/term_RNAseq_analysis/trimmed_reads_cor_val/fixed_1C_1_R2.cor_val_2.fq
export summary=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/1C_1_rrna_summary.txt
export mapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/rrna_mapped_1C_1
export unmapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/rrna_unmapped_1C_1
export single_mapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/single_mapped_1C_1
export single_unmapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/single_unmapped_1C_1
bowtie2 **--very-sensitive-local** --phred33 -x $silva_db -1 $r1 -2 $r2 **--threads 40** --met-file $summary --al-conc-gz $mapped --un-conc-gz $unmapped --al-gz $single_mapped --un-gz $single_unmapped -S "$name"_out.sam
I expect the output to be finished summary, unmapped, single_mapped, and single_unmapped files which can be found in the file path /mnt/research/hhorch/term_RNAseq_analysis/rRNA . I have been following the Harvard Best Practices for de novo Transcriptome assembly (https://informatics.fas.harvard.edu/best-practices-for-de-novo-transcriptome-assembly-with-trinity.html) very closely.