1

I have a hash of lists that is not getting populated.

I checked that the block at the end that adds to the hash is in fact being called on input. It should either add a singleton list if the key doesn't exist, or else push to the back of the list (referenced under the right key) if it does.

I understand that the GOTO is ugly, but I've commented it out and it has no effect.

The problem is that when printhits is called, nothing is printed, as if there are no values in the hash. I also tried each (%genomehits), no dice.

THANKS!

#!/usr/bin/perl
use strict;
use warnings;

my $len = 11; # resolution of the peaks

#$ARGV[0] is input file
#$ARGV[1] is call number
# optional -s = spread number from call
# optional -o specify output file name
my $usage = "see arguments";
my $input = shift @ARGV or die $usage;
my $call = shift @ARGV or die $usage;
my $therest = join(" ",@ARGV) . " ";
print "the rest".$therest."\n";
my $spread = 1;
my $output = $input . ".out";
if ($therest =~ /-s\s+(\d+)\s/) {$spread = $1;}
if ($therest =~ /-o\s+(.+)\s/) {$output = $1;}

# initialize master hash
my %genomehits = ();

foreach (split ';', $input) {
    my $mygenename = "err_naming";
    if ($_ =~ /^(.+)-/) {$mygenename = $1;}

    open (INPUT, $_);
    my @wiggle = <INPUT>;

    &singlegene(\%genomehits, \@wiggle, $mygenename);

    close (INPUT);
}

&printhits;

#print %genomehits;
sub printhits {
    foreach my $key (%genomehits) {
        print "key: $key , values: ";
    foreach (@{$genomehits{$key}}) {
        print $_ . ";";
    }
    print "\n";
    }
}

sub singlegene {
 # let %hash be the mapping hash
 # let @mygene be the gene to currently process
 # let $mygenename be the name of the gene to currently process

    my (%hash) = %{$_[0]};
    my (@mygene) = @{$_[1]};
    my $mygenename = $_[2];

    my $chromosome;
    my $leftbound = -2;
    my $rightbound = -2;

    foreach (@mygene) {
        #print "Doing line ". $_ . "\n";

        if ($_ =~ "track" or $_ =~ "output" or $_ =~ "#") {next;}

        if ($_ =~ "Step") {
            if ($_ =~ /chrom=(.+)\s/) {$chromosome = $1;}
            if ($_ =~ /span=(\d+)/) {$1 == 1 or die ("don't support span not equal to one, see wig spec")};
            $leftbound = -2;
            $rightbound = -2;
            next;
        }

        my @line = split /\t/, $_;
        my $pos = $line[0];
        my $val = $line[-1];

        # above threshold for a call
        if ($val >= $call) {
            # start of range
            if ($rightbound != ($pos - 1)) {
                $leftbound = $pos;
                $rightbound = $pos;
            }
            # middle of range, increment rightbound
            else {
                $rightbound = $pos;
            }

            if (\$_ =~ $mygene[-1]) {goto FORTHELASTONE;}
        }
        # else reinitialize: not a call
        else {
            FORTHELASTONE:
            # typical case, in an ocean of OFFs
            if ($rightbound != ($pos-1)) {
                $leftbound = $pos;
            }
            else {
            # register the range
                my $range = $rightbound - $leftbound;
                for ($spread) {
                    $leftbound -= $len;
                    $rightbound += $len;
                }
                #print $range . "\n";

                foreach ($leftbound .. $rightbound) {
                    my $key = "$chromosome:$_";
                    if (not defined $hash{$key}) {
                        $hash{$key} = [$mygenename];
                    }
                    else { push @{$hash{$key}}, $mygenename; }
                }
            }
        }

    }

}
Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
Overflown
  • 1,830
  • 2
  • 19
  • 25
  • I would highly recommend just shifting your arguments off of the argument stack also. Technically directly refrencing them is faster but it's harder to read and may actually have contributed to your confusion here. – Jeremy Wall Jun 10 '09 at 02:12

2 Answers2

4

You are passing a reference to %genomehits to the function singlegene, and then copying it into a new hash when you do my (%hash) = %{$_[0]};. You then add values to %hash which goes away at the end of the function.

To fix it, use the reference directly with arrow notation. E.g.

my $hash = $_[0];
...
$hash->{$key} = yadda yadda;
friedo
  • 65,762
  • 16
  • 114
  • 184
  • thanks a lot; I was using references like this before but perl told me this was deprecated semantics. Never listen to the compiler! – Overflown Jun 09 '09 at 15:49
  • I'm highly suprised perl told you that. As far as I know it's not deprecated at all. – Jeremy Wall Jun 10 '09 at 02:13
  • $hash->{$key} == ${$hash}{$key} == $$hash{$key}; some people like #2 or #3 more than #1. #1 is the most common, but #3 has the advantage of being easily interpolated into strings. – ephemient Jun 10 '09 at 02:48
2

I think it's this line:

my (%hash) = %{$_[0]};

You're passing in a reference, but this statement is making a copy of your hash. All additions you make in singlegene are then lost when you return.

Leave it as a hash reference and it should work.

PS - Data::Dumper is your friend when large data structures are not behaving as expected. I'd sprinkle a few of these in your code...

use Data::Dumper; print Dumper \%genomehash;

jmanning2k
  • 9,297
  • 4
  • 31
  • 23