2

I'm using Bioperl to find GOterms for genes. I retrieve an html file, convert it to text, get rid of all extra spaces and newlines, and try to go through the resulting array.

However, I keep getting errors for accessing uninitialized values in the array. I put in many checks to make sure the array is not empty and that I'm not going out of bounds. How can I get rid of this error?

I reposted the code in a more readable format. Thank you for your help.

It seems to successfully parse out the correct data from the html so I don't know what's wrong.

#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use HTML::TreeBuilder;
use HTML::FormatText;

my $URL         = get("http://amigo.geneontology.org/amigo/term/GO:0000001");
my $Format      = HTML::FormatText->new;
my $TreeBuilder = HTML::TreeBuilder->new;
$TreeBuilder->parse($URL);
my $Parsed = $Format->format($TreeBuilder);
print "$Parsed";
my @parsed = split( /[ ]{2,}|(\n+)|(\r+)/, $Parsed );
if ( @parsed == 1 ) { return; }

my %termhash;
my $count = 0;

while ( $count < @parsed ) {
    if ( defined $parsed[$count] && $parsed[$count] eq 'Name' ) {
        my $count2 = $count;
        while ( ( $parsed[$count2] ne 'Feedback' ) && ( $count2 < @parsed ) ) {
            $count2++;
        }
        $count2--;
        @parsed = @parsed[ $count .. $count2 ];    # Gets the slice of the array needed
        last;
    }
    $count++;
}

if ( @parsed <= 1 ) { return; }
print "\n";

print @parsed;

$count = 0;
while ( $count < @parsed ) {
    if ( $parsed[$count] eq 'Name' ) {
        while ( $parsed[$count] ne 'Ontology' && ( $count < @parsed )) {
            $termhash{'Name'} .= $parsed[$count];
            $count++;
        }
    }
    if ( $parsed[$count] eq 'Ontology' ) {
        while ( $parsed[$count] ne 'Synonyms' && ( $count < @parsed )) {
            $termhash{'Category'} .= $parsed[$count];
            $count++;
        }
    }
    if ( $parsed[$count] eq 'Synonyms' ) {
        while ( $parsed[$count] ne 'Definition' && ( $count < @parsed )) {
            $termhash{'Aliases'} .= $parsed[$count];
            $count++;
        }
    }
    if ( $parsed[$count] eq 'Definition' ) {
        while ( $parsed[$count] ne 'Comment' && ( $count < @parsed )) {
            $termhash{'Definition'} .= $parsed[$count];
            $count++;
        }
    }
    if ( $parsed[$count] eq 'Comment' ) {
        while ( $parsed[$count] ne 'History' && ( $count < @parsed )) {
            $termhash{'Comment'} .= $parsed[$count];
            $count++;
        }
    }
    if ( $parsed[$count] eq 'History' ) {
        while ( $parsed[$count] ne 'Subset' && ( $count < @parsed )) {
            $termhash{'Version'} .= $parsed[$count];
            $count++;
        }
    }
    if ( $parsed[$count] eq 'Subset' ) {
        while ( ( $parsed[$count] ne 'Community' ) && ( $count < @parsed ) ) {
            $count++;
        }
    }
    if ( $parsed[$count] eq 'Community' ) {
        while ( ( $parsed[$count] ne 'Related' ) && ( $count < @parsed ) ) {
            $count++;
        }
    }
    if ( $parsed[$count] eq 'Related' ) {
        for ( $count < @parsed ) {
            $termhash{'Definition references'} .= $parsed[$count];
            $count++;
        }
    }
}
if ( $termhash{'Definition'} =~ m/OBSOLETE/ ) { $termhash{'Is obsolete'} = 1 }
else { $termhash{'Is obsolete'} = 0 }
#print %termhash;

The main error messages are:

  • Use of uninitialized value $parsed[127] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 23.

    Use of uninitialized value $parsed[1] in print at /home/adur/workspace/BI7643/ParseGOhtml.pl line 35.

    Use of uninitialized value $parsed[1] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 42.

    Use of uninitialized value $parsed[1] in concatenation (.) or string at /home/adur/workspace/BI7643/ParseGOhtml.pl line 41.

    Use of uninitialized value $parsed[17] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 48.

    Use of uninitialized value $parsed[17] in concatenation (.) or string at /home/adur/workspace/BI7643/ParseGOhtml.pl line 47.

    Use of uninitialized value $parsed[29] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 54.

    Use of uninitialized value $parsed[29] in concatenation (.) or string at /home/adur/workspace/BI7643/ParseGOhtml.pl line 53.

    Use of uninitialized value $parsed[41] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 60.

    Use of uninitialized value $parsed[41] in concatenation (.) or string at /home/adur/workspace/BI7643/ParseGOhtml.pl line 59.

    Use of uninitialized value $parsed[79] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 66.

    Use of uninitialized value $parsed[79] in concatenation (.) or string at /home/adur/workspace/BI7643/ParseGOhtml.pl line 65.

    Use of uninitialized value $parsed[83] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 72.

    Use of uninitialized value $parsed[83] in concatenation (.) or string at /home/adur/workspace/BI7643/ParseGOhtml.pl line 71.

    Use of uninitialized value $parsed[95] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 77.

    Use of uninitialized value $parsed[107] in string ne at /home/adur/workspace/BI7643/ParseGOhtml.pl line 82.

eugheugh
  • 67
  • 7
  • It may be helpful to include more details from the error message. – G. Cito May 22 '15 at 04:24
  • what you are doing with `count2--` doesn't make much sense to me. – stevesliva May 22 '15 at 04:31
  • 1
    You should not compress your code so much that it is no longer readable. Make use of whitespace. And `use warnings`. And don't use `use Switch`, the module is deprecated -- use if-else instead. – TLP May 22 '15 at 07:50
  • This error message corresponds to the second to last case: Use of uninitialized value $parsed[101] in string ne at /home/adur/workspace/BI7643/GOexamp.pl line 105, line 1. I didn't include the error message earlier because it includes information regarding working code outside of what I posted. Count2-- moves the index back so that 'Feedback' isn't included in the array slice. The reason I compressed the switch part is because I lose points for whitespace. I'll replace it with if-else. – eugheugh May 22 '15 at 17:14

1 Answers1

5

Did you mean to not quote the second and third call to parse?

htmlparse('GO:0000001');
htmlparse(GO:0000002);
htmlparse(GO:0000003); 

Should be;

htmlparse('GO:0000001');
htmlparse('GO:0000002');
htmlparse('GO:0000003'); 

Also, make sure to add to the top of your file the following

use warnings;
use diagnostics;

Post back the resulting error messages if there is still a problem.

Also, according to http://geneontology.org/page/lead-database-guide you can query the online database directly without having to parse html, they provide the following example using their tool;

GOshell.pl -d go_latest -dbuser go_select -dbauth amigo -port 4085 -h mysql.ebi.ac.uk

Relevent cpan information; http://cpansearch.perl.org/src/SJCARBON/go-db-perl-0.04/doc/go-db-perl-doc.html

harvey
  • 2,945
  • 9
  • 10
  • That actually requires having the database downloaded. I can't use it, but thank you. It's mainly a parsing problem now. – eugheugh May 22 '15 at 19:36