2

I have to process a big file, and have been reading about parallel command to try to use more than 1 core processor when using sed, sort and so on. So I first wanted to change first line of every four (because of naming conventions of this kind of file - FastQ format).

For example, this would be a group of four, and I want to modify the first line:

cat sbcc073_pcm_ill_all.musket_default.fastq | head -4

@HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289_1:N:0:ACTTGA
GCGAGAGAATGGATGAGTTGATAGTTACACAGCGGTTTTGATATACTGATGCCTTGTATATGTTCGT
+
GHHHHHHHHHHGGEEGEDGGGGH=HHHHHEGDBFF8BED=BAEEEAHHHBD>GGGEEHHHFE>GG@E

With the next command I have the work done:

cat sbcc073_pcm_ill_all.musket_default.fastq | head -4 | sed 's#^\(@.*\)_\([12]\).*#\1/\2#'

@HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289/1
GCGAGAGAATGGATGAGTTGATAGTTACACAGCGGTTTTGATATACTGATGCCTTGTATATGTTCGT
+
GHHHHHHHHHHGGEEGEDGGGGH=HHHHHEGDBFF8BED=BAEEEAHHHBD>GGGEEHHHFE>GG@E

However, when using parallel it seems that is not recognizing the group capture brackets:

cat sbcc073_pcm_ill_all.musket_default.fastq | head -4 | parallel --pipe sed 's#^\(@.*\)_\([12]\).*#\1/\2#'

@HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289_1:N:0:ACTTGA
GCGAGAGAATGGATGAGTTGATAGTTACACAGCGGTTTTGATATACTGATGCCTTGTATATGTTCGT
+
GHHHHHHHHHHGGEEGEDGGGGH=HHHHHEGDBFF8BED=BAEEEAHHHBD>GGGEEHHHFE>GG@E

When removing backslashes or using sed -r the command is telling me:

/bin/bash: -c: line 3: syntax error near unexpected token `('
/bin/bash: -c: line 3: `             (cat /tmp/60xrxvCIRX.chr; rm /tmp/60xrxvCIRX.chr; cat - ) | (sed s#^(@.*)_([12]).*#\1/\2# );'

Could anyone put some light on this?

thank you very much

Leandro Papasidero
  • 3,728
  • 1
  • 18
  • 33
  • You should consider using Perl module [`Bio::Index::Fastq`](http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html) – mvp May 08 '13 at 09:50

1 Answers1

1
parallel --pipe "sed 's#^\(@.*\)_\([12]\).*#\1/\2#'"

Try inserting the full command inside double quotes like this.

Sidharth C. Nadhan
  • 2,191
  • 2
  • 17
  • 16
  • thank you. And why do I need to scape '$' here? parallel --pipe "perl -lne 'if($.%4==1){s/^(@.*)_([12]).*/\$1\/\$2/;print}' " – cantalapiedra May 08 '13 at 12:04