-5

I am running the following script:

# Usage: sh hmmscan-parser.sh hmmscan-file

# 1. take hmmer3 output and generate the tabular output
# 2. sort on the 6th and 7th cols
# 3. remove overlapped/redundant hmm matches and keep the one with the lower e-values
# 4. calculate the covered fraction of hmm (make sure you have downloaded the "all.hmm.ps.len" file to the same directory of this perl script)
# 5. apply the E-value cutoff and the covered faction cutoff
cat $1 | perl -e 'while(<>){if(/^\/\//){$x=join("",@a);($q)=($x=~/^Query:\s+(\S+)/m);while($x=~/^>> (\S+.*?\n\n)/msg){$a=$&;@c=split(/\n/,$a);$c[0]=~s/>> //;for($i=3;$i<=$#c;$i++){@d=split(/\s+/,$c[$i]);print $q."\t".$c[0]."\t$d[6]\t$d[7]\t$d[8]\t$d[10]\t$d[11]\n" if $d[6]<1;}}@a=();}else{push(@a,$_);}}' \
        | sort -k 1,1 -k 6,6n -k 7,7n | uniq \
        | perl -e 'while(<>){chomp;@a=split;next if $a[-1]==$a[-2];push(@{$b{$a[0]}},$_);}foreach(sort keys %b){@a=@{$b{$_}};for($i=0;$i<$#a;$i++){@b=split(/\t/,$a[$i]);@c=split(/\t/,$a[$i+1]);$len1=$b[-1]-$b[-2];$len2=$c[-1]-$c[-2];$len3=$b[-1]-$c[-2];if($len3>0 and ($len3/$len1>0.5 or $len3/$len2>0.5)){if($b[2]<$c[2]){splice(@a,$i+1,1);}else{splice(@a,$i,1);}$i=$i-1;}}foreach(@a){print $_."\n";}}' \
        | uniq | perl -e 'open(IN,"all.hmm.ps.len");

while(<IN>)
{
        chomp;
        @a=split;
        $b{$a[0]}=$a[1]; # creates hash of hmmName : hmmLength
}

while(<>)
{
        chomp;
        @a=split;
        $r=($a[3]-$a[2])/$b{$a[1]}; # $a[4] = hmm end $a[3] = hmm start ; $b{$a[1]} = result of the hash of the name of the hmm (hmm length).
        print $_."\t".$r."\n";
}' \
        | perl -e 'while(<>){@a=split(/\t/,$_);if(($a[-2]-$a[-3])>80){print $_ if $a[2]<1e-5;}else{print $_ if $a[2]<1e-3;}}' | awk '$NF>0.3'

When I run the file, I get "Illegal division by zero at -e line 14, <> line 1" Line 14 is :

$r=($a[3]-$a[2])/$b{$a[1]};

The first input (hmmscan-file) is under this form :

Query:       NODE_1_length_300803_cov_11.207433_1  [L=264]
Description: # 139 # 930 # 1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.543
Scores for complete sequence (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Model    Description
    ------- ------ -----    ------- ------ -----   ---- --  -------- -----------

   [No hits detected that satisfy reporting thresholds]


Domain annotation for each model (and alignments):

   [No targets detected that satisfy reporting thresholds]


Internal pipeline statistics summary:
-------------------------------------
Query sequence(s):                         1  (264 residues searched)
Target model(s):                         641  (202466 nodes)
Passed MSV filter:                        18  (0.0280811); expected 12.8 (0.02)
Passed bias filter:                       18  (0.0280811); expected 12.8 (0.02)
Passed Vit filter:                         1  (0.00156006); expected 0.6 (0.001)
Passed Fwd filter:                         0  (0); expected 0.0 (1e-05)
Initial search space (Z):                641  [actual number of targets]
Domain search space  (domZ):               0  [number of targets reported over threshold]
# CPU time: 0.02u 0.02s 00:00:00.04 Elapsed: 00:00:00.03
# Mc/sec: 1357.56
//
Query:       NODE_1_length_300803_cov_11.207433_2  [L=184]
Description: # 945 # 1496 # 1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.431

the second input named (all.hmm.ps.len) is under this form whoever the third pearl command calls him.

BM10.hmm        28
CBM11.hmm       163

I did not understand where the problem is knowing that the script in general aims to create an array to clearly read the input (hmmscan-file).

thank you very much

SAAD
  • 27
  • 6
  • 1
    `$b{$a[1]}` in the division is zero (or undefined). Probably a missing key in the hash. If you add warnings `perl -we` you might get more useful information. Or you could handle the possibility by checking first, e.g. `if (! defined($b{$a[1]})) { warn "Undefined key in hash for value $a[1]"; next; }` – TLP Nov 05 '21 at 12:33
  • 5
    With this much complexity, you would be much better off writing this whole thing in Perl and skipping this weird shell script integration into long one-liner pipes. Perl is much better at this than shell script. – TLP Nov 05 '21 at 12:35
  • 1
    This is (at least) the third time you've posted the question without really adding any information. – AKHolland Nov 05 '21 at 13:16
  • @AKHolland What types of information do you want? I put the script and I put the inputs !! – SAAD Nov 05 '21 at 13:43
  • 1
    I already narrowed down the problem for you in your previous query, but you seem immune to being helped. Once again: the second column of the output from the 'uniq' command is being used to look up a value in the all.hmm.ps.len file but it's not finding match.. Why don't you look at the output from the uniq command and look at the all.hmm.ps.len file and see what's mismatched. – Dave Mitchell Nov 05 '21 at 14:06
  • 1
    The first `perl` program outputs nothing for the file you provided... – ikegami Nov 05 '21 at 15:18

1 Answers1

4

So you have this error:

Illegal division by zero at -e line 14, <> line 1

And you say that line 14 is this:

$r=($a[3]-$a[2])/$b{$a[1]};

Well, there's only one division in that line of code, so it seems clear that when you're executing that line $b{$a[1]} contains zero (but note that because you don't have use warnings in your code, it's possible that it might also contain an empty string or undef and that's being silently converted to a zero).

So as the programmer, your job is to trace through your code and find out where these values are being set and what might be causing it not to contain a value that can be used in your division.

I would be happy to help work out what the problem is, but for a few problems:

  • Your program reads from two input files and you've only given us one.
  • Your variables all have single-letter names, making it impossible to understand what they are for.
  • Your use of a string of command-line programs is bizarre and renders your code pretty much unreadable.

If you want help (not just from here, but from any forum), your first move should be to make your code as readable as possible. It's more than likely that by doing that, you'll find the problem yourself.

But your current habit of reposting the question with incremental changes is winning you no friends here.

Update: One idea might be to rewrite your line as something like this:

if (exists $b{$a[1]}) {
  $r=($a[3]-$a[2])/$b{$a[1]};
} else {
  warn "Hey, I can't find the key '$a[1]' in the hash \%b\n";
}
Dave Cross
  • 68,119
  • 3
  • 51
  • 97