1


I am trying to create an automatic chain of commands for analyzing biological data.
For this I am using Samtools in Slurm cluster. This line below is one of the commands I run for the analysis:
samtools view -h file.sam | awk '$6 ~ /N/ || $1 ~ /^@/' | samtools view -h > spliced.file.sam
Using this, I get my expected output (simple).
However, when I want to insert the command into a job with --wrap I get a syntax error.
As presented:

sbatch --wrap "samtools view -h file.sam | awk '$6 ~ /N/ || $1 ~ /^@/' | samtools view -h > sp.file.sam"

    awk:  ~ /N/ ||  ~ /^@/
    awk:  ^ syntax error

Using srun at the start of the command and & at the end, are very helpful when submitting, but can I use it when I want to create a pipeline of commands? And can I add a dependency for this command? Is there a possible way to use the
--wrap for this command?

I am aiming to create a automatic pipeline of commands, as the link below shows - https://gencore.bio.nyu.edu/building-an-analysis-pipeline-for-hpc-using-python/

Thanks in advance.

  • Have you tried escaping the `’` in the awk command with ` \ ` ? – Carles Fenoy Feb 09 '22 at 21:34
  • 1
    Use a workflow framework like nextflow, snakemake, or cromwell when handling jobs big enough to warrant using a cluster :) it will take some time to learn but save tons in the end. – Pallie Feb 09 '22 at 22:36
  • 1
    I've never heard of `sbatch` so I might be way off but it sounds like something is interpreting `$6` and `$1` before awk gets to see them so try changing those to `\$6` and `\$1`. – Ed Morton Feb 10 '22 at 01:59
  • 1
    that's because you used double quotes around and there is awk with `$6` and `$1` there and those `$` operators need to be escaped `\$6` and `\$1` to parent them expanding by the shell as special as mentioned by @EdMorton too – αғsнιη Feb 10 '22 at 15:16
  • thank you Ed Morton and αғsнιη for your help, it worked! – shaked shanas Feb 14 '22 at 08:37

2 Answers2

1

The most straightforward way to do that would be to write the

samtools view -h file.sam | awk '$6 ~ /N/ || $1 ~ /^@/' | samtools view -h > spliced.file.sam

line to a shell script (e.g. myscript.sh)

#!/usr/bin/env bash

file=${1?Usage: $0 <file.sam>}

samtools view -h "$file" | awk '$6 ~ /N/ || $1 ~ /^@/' | samtools view -h > spliced.file.sam

so that you can then issue

sbatch --wrap "./myscript.sh file.sam"

without the burden of managing shell escape with quotes. This would further allow you running commands like this

find . -name \*.sam -print0 | xargs -0 -I{} sbatch --wrap "./myscript.sh {}"

that will submit one job per .sam file found in the current directory, or use it in a Python script like the reference you mention.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
0

This has nothing to do with the single quotation. You can do

sbatch --wrap="samtools view | head| awk '{print}'"

and it'll work just fine.

When you don't want to save the command in a shell script, you need to escape the dollar sign inside the AWK command.

sbatch --wrap "samtools view -h file.sam | awk '\$6 ~ /N/ || \$1 ~ /^@/' | samtools view -h > sp.file.sam"

The error is:

awk: ~ /N/ || ~ /^@/

Compare to the command you ran, neither field operator ($6 and $1) made it into sbatch submit, which means mostly likely the dollar sign was giving you trouble. I had a similar problem and escaping all $ worked for me.

Meng
  • 1