4

I have 44 .tsv files in one folder and I want to calculate the number of intersect of each pairwise with intersect command of bedtools tool. each output file would have 4 columns and I just need to save only sum of value of column 4 in each output file. I can do it easily when I do it by one one but when I use parallel processing to do the whole process at the same time I get syntax error

Here is the code and result when I try each two pairs by one one manually

$ bedtools intersect -a p1.tsv -b p2.tsv -c

chr1    1   5   1

chr1    8   12  1

chr1    18  20  1

chr1    21  25  0

bedtools intersect -a p1.tsv -b p2.tsv -c | awk '{sum+=$4} END {print sum}

3

Here is the code and result when I am using parallel processing

$ parallel "bedtools intersect -a {1} -b {2} -c |awk '{sum+=$4} END {print sum}'> {1}.{2}.intersect" ::: `ls *.tsv` ::: `ls *.tsv`

awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error

The result should be 44*44 files that contain one single value foe example just 3

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432

3 Answers3

4

@DudiBoy has a good solution. But to me it is annoying that I have to make another file just because I want to call GNU Parallel.

So you can also use functions. This way you do not need to make a new file:

doit() {
  bedtools intersect -a "$1" -b "$2" -c | awk '{sum+=$4} END {print sum}'
}
export -f doit

parallel --results {1}.{2}.intersect doit {1} {2} ::: *.tsv ::: *.tsv
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
2

I think you need to quote it like this:

parallel bedtools intersect -a {1} -b {2} -c \| awk \'{sum+=\$4} END{print sum+0}\' \> {1}.{2}.intersect ::: *tsv ::: *tsv
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
2

I believe @MarkSetchell is valid answer. You can also try to clean it up by inserting your complicated line into a bash script you can test.

intersect.bash

 #!/bin/bash
 bedtools intersect -a $1 -b $2 -c | awk '{sum+=$4} END {print sum}'

Test intersect.bash to function correctly, then parallel it.

parallel intersect.bash {1} {2}

good luck.

Dudi Boy
  • 4,551
  • 1
  • 15
  • 30