0

Issue

I have a few PE fastq files from an infected host. First I map reads to the host and keep reads that did not map and convert that single bam to new paired end fastq files. The next loop takes the new PE fastq files and maps them to the pathogen. The problem I'm facing is the beginning of the second loop does not find the associated R2.fastq. All work is being done on my institution's linux compute cluster.

The appropriate files are created at the end of the first loop and the second loop is able to find the R1 files, but not the F2 files in the same directory. I have stared at this for a couple days now, making changes in an attempt to figure out the naming issue.

Any help determining the issue with the second for loop would be greatly appreciated. Keep in mind this is my first post and my degree is in biology. Please gentle.

Code

#PBS -S /bin/bash
#PBS -l partition=bigmem,nodes=1:ppn=16,walltime=1:00:00:00
#PBS -A ACF-UTK0011
#PBS -M wbrewer5@vols.utk.edu
#PBS -m abe
#PBS -e /lustre/haven/user/wbrewer5/pandora/lowcov/error/
#PBS -o /lustre/haven/user/wbrewer5/pandora/lowcov/log/
#PBS -N PandoraLowCovMapping1
#PBS -n

cd $PBS_O_WORKDIR

set -x
module load samtools
module load bwa

#create indexes for pea aphid and pandora genomes
#bwa index -p pea_aphid.fna pea_aphid.fna
#bwa index -p pandora_canu_pilon.fasta pandora_canu_pilon.fasta

#map read files to the aphid genome and keep reads that do not map
for r1 in `ls /lustre/haven/user/wbrewer5/pandora/lowcov/reads/*R1.fastq`

do

  r2=`sed 's/R1.fastq/R2.fastq/' <(echo $r1)`
  BASE1=$(basename $r1 | sed 's/_R1.fastq*//g')
  echo "r1 $r1"
  echo "r2 $r2"
  echo "BASE1 $BASE1"

  bwa mem -t 16 -v 3 \
  pea_aphid.fna \
  $r1 \
  $r2 |
  samtools view -@ 16 -u -f 12 -F 256 - |
  samtools sort -@ 16 -n - |
  samtools fastq - \
  -1 /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/$BASE1\_unmapped_R1.fastq \
  -2 /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/$BASE1\_unmapped_R2.fastq \
  -0 /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/$BASE1\_trash.txt \
  -s /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/$BASE1\_more_trash.txt

  echo "Step 1: mapped reads from $BASE1 to aphid genome and saved to 1_samtools as paired end .fastq"

done

rm /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/*trash*

echo "saving unmapped reads to new fastq files complete!"

for f1 in `ls /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/*unmapped_R1.fastq`

do

  f2=`sed 's/R1.fastq/R2.fastq/' >(echo $f1)`
  BASE2=$(basename $f1 | sed 's/_R1.fastq*//g')
  echo "f1 $f1"
  echo "f2 $f2"
  echo "BASE2 $BASE2"

  bwa mem -t 16 -v 3 \
  pandora_canu_pilon.fasta \
  $f1 \
  $f2 |
  samtools sort -@ 16 -o ./2_angsd/$BASE2\.bam -

  echo "Step 2: mapped reads from $BASE2 to pandora genome saved to 2_angsd as .bam"

done

echo "Mapping new fastq files to pandora genome complete!!"

Log

First file of first loop

++ ls /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_614_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_686_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/reads/p-251_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/reads/p-614_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/reads/p-686_R1.fastq
+ for r1 in '`ls /lustre/haven/user/wbrewer5/pandora/lowcov/reads/*R1.fastq`'
++ sed s/R1.fastq/R2.fastq/ /dev/fd/63
+++ echo /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R1.fastq
+ r2=/lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R2.fastq
++ basename /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R1.fastq
++ sed 's/_R1.fastq*//g'
+ BASE1=Matt_251
+ echo 'r1 /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R1.fastq'
+ echo 'r2 /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R2.fastq'
+ echo 'BASE1 Matt_251'
+ bwa mem -t 16 -v 3 pea_aphid.fna /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/reads/Matt_251_R2.fastq
+ samtools view -@ 16 -u -f 12 -F 256 -
+ samtools sort -@ 16 -n -
+ samtools fastq - -1 /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_251_unmapped_R1.fastq -2 /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_251_unmapped_R2.fastq -0 /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_251_trash.txt -s /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_251_more_trash.txt

First file of second loop

++ ls /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_251_unmapped_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_614_unmapped_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_686_unmapped_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/p-251_unmapped_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/p-614_unmapped_R1.fastq /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/p-686_unmapped_R1.fastq
+ for f1 in '`ls /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/*unmapped_R1.fastq`'
++ sed s/R1.fastq/R2.fastq/ /dev/fd/63
+++ echo /lustre/haven/user/wbrewer5/pandora/lowcov/1_samtools/Matt_251_unmapped_R1.fastq
  • As the [bash tag](https://stackoverflow.com/tags/bash/info) you used in your question instructs you to do - copy/paste your shell script(s) into https://shellcheck.net and fix the problems that tool tells you about before then posting here if you still have a problem afterwards. – Ed Morton Apr 08 '22 at 17:35
  • Side note: [Do not parse output from `ls`.](http://mywiki.wooledge.org/ParsingLs) – Andrej Podzimek Apr 08 '22 at 17:45
  • one quick comment re: `f2=\`sed 's/R1.fastq/R2.fastq/' >(echo $f1)\``... assuming the objective is to modify the name of `$f1` and save the new name in `f2`, this piece of code doesn't do what you think it does; consider replacing with `f2=$(sed 's/R1.fastq/R2.fastq/' <<< "$f1")`, or if the `R1` only shows up in the file name: `f2="${f1/R1/R2}"` – markp-fuso Apr 08 '22 at 21:35
  • @markp-fuso This was the fix. I definitely need to understand more. Would you be able to briefly explain the difference in outcome between those two lines? I am curious why the same formatting worked for the first loop. Thank you very much for solving a week-long mystery. – willbrew Apr 10 '22 at 13:57
  • @EdMorton Thank you for mentioning this resource. I had not noticed a pop up after adding the bash tag. That is indeed a helpful site aside from throwing errors at the syntax necessary for the scheduler. – willbrew Apr 10 '22 at 13:59
  • @AndrejPodzimek I greatly appreciate the guidance. While not the problem here, I am glad to have learned of this issue before I ran into it. – willbrew Apr 10 '22 at 14:01
  • @willbrew you're welcome. Regarding `aside from throwing errors at the syntax necessary for the scheduler` - it's extremely unlikely that "the scheduler" (whatever that is) requires syntax that is flagged as errors by spellcheck. You may want to not dismiss those as there's probably some alternative non-erroneous syntax that "the scheduler" would also accept that'd save you from future problems. – Ed Morton Apr 10 '22 at 14:06
  • @willbrew in the first loop you use `<(echo $r1)` while in the 2nd loop you use `>(echo $r1)` ... notice the `<` vs `>`; you may be able to 'fix' the 2nd loop by using `<(echo $r1)` but I'd suggest one of the other recommendations; the `<<< "$f1"` is called a here-string and allows you to pass a string/variable into a command that's normally looking for a file; the `${f1/R1/R2}` falls under the category of [parameter substitution](https://tldp.org/LDP/abs/html/parameter-substitution.html) – markp-fuso Apr 10 '22 at 16:52
  • @EdMorton They weren't necessarily errors, just comments. I'm using the term "scheduler" as this is how I have heard it referred to in meetings with system administrators when discussing the syntax needed for jobs to be accepted by the queue. – willbrew Apr 11 '22 at 15:06
  • 1
    @markp-fuso It never fails to be a simple, incorrect character that wastes multiple days of work. Thank you again for the help. – willbrew Apr 11 '22 at 15:08

0 Answers0