I have a file (fasta) that I am using awk to extract the needed fields from (sequences with their headers). I then pipe it to a BLAST program and finally I pipe it to qsub in order to submit a job. the file:
>sequence_1
ACTGACTGACTGACTG
>sequence_2
ACTGGTCAGTCAGTAA
>sequence_3
CCGTTGAGTAGAAGAA
and the command (which works):
awk < fasta.fasta '/^>/ { print $0 } $0 !~ /^>/' | echo "/Local/ncbi-blast-2.2.25+/bin/blastx -db blastdb.fa -outfmt 5 >> /User/blastresult.xml" | qsun -q S
what I would like to do is a add a condition that will sample the number of jobs I am running (using qstat) if it is below a certain threshold the job will be submitted. for example:
allowed_jobs=200 #for example
awk < fasta.fasta '/^>/ { print $0 } $0 !~ /^>/' | echo "/Local/ncbi-blast-2.2.25+/bin/blastx -db blastdb.fa -outfmt 5 >> /User/blastresult.xml" | cmd=$(qstat -u User | grep -c ".") | if [ $cmd -lt $allowed_jobs ]; then qsub -q S
unfortunately (for me anyway) I have failed in all my attempts to do that. I'd be grateful for any help
EDIT: elaborating a bit: what I am trying to do is to extract from the fasta file this:
>sequene_x
ACTATATATATA
or basically: >HEADER\nSEQUENCE one by one and pipe it to the blast program which can take stdin. I want to create a unique job for each sequence and this is the reason I want to pipe to qsub for each sequence. to put it plainly the qsub submission would have looked something like this:
qsub -q S /Local/ncbi-blast-2.2.25+/bin/blastx -db blastdb.fa -query FASTA_SEQUENCE -outfmt 5 >> /User/blastresult.xml
note that the -query flag is unnecessary if stdin sequence is piped to it. however, the main problem for me is how to incorporate the condition I mentioned above so that the sequence will be piped to qsub only if the qstat result is below a threshold. ideally if the qstat result is above the threshold it'll sleep until i goes below and then pass it forward.
thanks.