1

it's me again. I am having trouble creating a hash of arrays even after looking at documentation. I want the HoA to contain the log-odds score of a motif (smaller sequence) within a DNA sequence. I want the structure to look like:

$HoA{$id}[$pos] = #score based on the position

Where the $id is the sequence ID and the $pos is the position within the sequence at which the motif starts. I input a .txt file containing DNA sequences that is formatted as such:

>Sequence_1
TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCT
>Sequence_2
CCCACGCAGCCGCCCTCCTCCCCGGTCACTGACTGGTCCTG
>Sequence_3
TCGACCCTCTGGAACCTATCAGGGACCACAGTCAGCCAGGCAAG

For example: a motif at position 2 for Sequence 1 would be 'AGA'. Below is the code I have so far (it is simplified a little):

use strict;
use warnings;
use Data::Dumper; 

print "Please enter the filename of the fasta sequence data: ";
my $filename1 = <STDIN>;

#Remove newline from file
chomp $filename1;

#Open the file and store each dna seq in hash
my %HoA = ();
my %loscore = ();
my $id = '';
open (FILE, '<', $filename1) or die "Cannot open $filename1.",$!;
my $dna;
while (<FILE>)
{
    if($_ =~ /^>(.+)/)
    {
        $id = $1; #Stores 'Sequence 1' as the first $id, etc.
    }
    else
    {
        $HoA{$id} = [ split(//) ]; #Splits the contents to allow for position reference later
        $loscore{$id} .= 0; #Creates a hash with each id number to have a log-odds score (initial score 0)
        $maxscore{$id} .= -30; #Creates a hash with each id number to have a maxscore (initial score -30)
    }
}
close FILE;

my $width = 3;

my %logodds;  #I know there is a better way to do this - this is just for simplicity
$logodds{'A'}[0] = 0.1;
$logodds{'A'}[1] = 0.2;
$logodds{'A'}[2] = 0.3;
$logodds{'C'}[0] = 0.2;
$logodds{'C'}[1] = 0.5;
$logodds{'C'}[2] = 0.2;
$logodds{'G'}[0] = 0.3;
$logodds{'G'}[1] = 0.2;
$logodds{'G'}[2] = 0.4;
$logodds{'T'}[0] = 0.4;
$logodds{'T'}[1] = 0.1;
$logodds{'T'}[2] = 0.1;

print Dumper (\%logodds);
print "\n\n";
for my $base (qw( A C G T))
{
    print "logodds$base @{$logodds{$base}}\n";
}

my @arr;

foreach $id (keys %HoA)
{   
    for my $pos1 (0..length($HoA{$id})-$width-1)    #Look through all positions the motif can start at
    {
        for my $pos2 ($pos1..$pos1+($width-1)) #look through the positions at a specific motif starting point
        {
            for my $base (qw( A C G T))
            {
                if ($HoA{$id}[$pos2] eq $base)  #If the character matches a base:
                {
                    for my $pos3 (0..$width-1) #for the length of the motif:
                    {
                        $arr[$pos1] += $logodds{$base}[$pos3]; 
                        @{ $loscore{$id}} = @arr; #Throws error here
                    }
                }   
            }   
        }
    }
}
print Dumper(\%loscore);

I keep getting the error: Can't use string ("0") as an ARRAY ref while "strict refs" in use at line 75.

An example of a log-odds score with this data that I want is:

$HoA{'Sequence 1'}[2] = 0.1 + 0.2 + 0.3 = 0.6

So, the log-odds score of the motif 'AGA' that begins a position 2 in Sequence 1 is 0.6. I appreciate all of your patience and help! Let me know if I need to clarify anything.

William
  • 39
  • 7
  • Which one is line 75? – Shawn Mar 27 '19 at 15:28
  • The line that states: `@{ $loscore{$id}} = @arr;` throws the error – William Mar 27 '19 at 15:30
  • Earlier you have `$loscore{$id} .= 0;` Appending a number to a string is a bit odd - I'd have used `"0"`, but perl will convert it so it works. The key bit though is `%loscore` holds strings, not array references. Hence that error. – Shawn Mar 27 '19 at 15:40

2 Answers2

0

I see a few issues in your code. Consider these lines:

$HoA{$id} = [ split(//) ];  # Splits the contents to allow for position reference later
$loscore{$id} .= 0;  # Creates a hash with each id number to have a log-odds score (initial score 0)
$maxscore{$id} .= -30;  # Creates a hash with each id number to have a maxscore (initial score -30)

According to your comments, you appear to want to initialize the entries of %loscore and %maxscore with 0 and -30. However, instead of using the good old = sign, you're using the .= operator (which appends strings). I don't think this is what you want, so consider changing the .= to just =.

(Or maybe you meant to use //= instead. That way, if %loscore and %maxscore already have an $id entry, it won't be overwritten. But only you can say for sure if you meant to use the //= operator.)

So now let's look at $loscore{$id} = 0. This tells us that %loscore is a hash (or "associated array") that, for each entry, takes an $id for the key, and a number as a value.

However, further down in your code you have this:

@{ $loscore{$id} } = @arr;

The fact that $loscore{$id} is wrapped by ${ ... } tells us that the values in %loscore are array references. But we already established above that its values are numbers!

And because you're treating a number as an array reference, Perl sees that as an error.

What you may have meant to write instead was:

@{ $HoA{$id} } = @arr;

Since the values to the %HoA hash contains array references, it makes sense that you would want to de-reference that as an array.

J-L
  • 1,786
  • 10
  • 13
  • I appreciate your help! After looking at it again, I think I want to use three hashes of arrays: `%HoA, %loscore, %maxscore`. If I want each initial value in `%loscore` to be `0` and each initial value in `%maxscore` to be `-30`, do you have any advice for creating these two hashes of arrays? – William Mar 27 '19 at 16:22
  • I am currently trying `map` to do this. – William Mar 27 '19 at 16:24
  • If you want to initialize each entry of `%loscore` and `%maxscore` to 0 and -30, then they necessarily cannot be "Hashes of Arrays." Instead, they will be "Hashes of Numbers." In other words, if you have a hash that holds array references as values (making it a "Hash of Arrays"), you can't have it hold numbers as its values. Maybe your intention was to initialize `%loscore` as `$loscore{$id} = [ 0 ];`. That way `%loscore` is indeed a "Hash of Arrays," where the arrays hold numbers as its elements. – J-L Mar 27 '19 at 17:35
  • Looking at your code again, I'm thinking that when you wrote `$loscore{$id} .= 0;` you *meant* to write `push @{ $loscore{$id} }, 0;`. (And when you wrote `$maxscore{$id} .= 0;` you meant to write `push @{ $maxscore{$id} }, -30;`.) This change will append 0 (or -30) to a value in `%loscore` that happens to be an array. Maybe this is what you want? – J-L Mar 27 '19 at 17:40
  • The `push` method you suggested is the closest to what I want. Yes, I want a number associated with each `$id`, liked you stated, but I want multiple numbers within each `$id`. So `$loscore{$id}` will have multiple scores that can be accessed by the position in the array that is referenced. – William Mar 27 '19 at 18:50
0

I THINK this solves the problem: Replace

$loscore{$id} .= 0; $maxscore{$id} .= -30;

With

foreach $id (keys %HoA)
    {
        for my $len (0..(length($HoA{$id})-$width-1))
        {
            push @{ $loscore{$id} }, 0;
            push @{ $maxscore{$id} }, -30;
        }
    }

Let me know if you have anything to add.

William
  • 39
  • 7