Parse tab delimited file into hash of array

Question

I'm a perl novice attempting to perform the following:

1) Take a user input
2) Match the input with instances of that value from column1 of file 1 and store the corresponding value from the column 2 in a hash, hash of array or hash of hash. (below code stores in hash of array but I'm not sure if this is optimal to accomplish 3 below)
3) I need to find all instances (if they exist) of the first column in file 2 = column 2 in file 1.

For simplicity I've provided sample file below.

I'm attempting to take a user input of 'AAA' in column 1 of the input file into a hash or array, as the key for all corresponding values in column 2.

My input file has multiple instances of 'AAA' in column 1 with different values for column 2, also there are multiple instances of 'AAA' and 'BBB' in columns 1 & 2. I believe in order to output this properly I need to use a hash of hash but I'm not sure syntactically how to approach it.

I've tried searching this site and found some examples but I'm afraid I'm only confusing myself more.

Example of input file.

AAA     BBB
AAA     CCC
AAA     BBB
BBB     DDD
CCC     AAA

Example of my code

#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
use Data::Dumper;

#declare values
my %hash = ();

#Get protein name from user
print "Get column 1 value: ";
my $value = <STDIN>;
chomp $value;

#open input file
open FILE, "file" or die("unable to open file\n"); 

    while(my $line = <FILE>) {
        chomp($line);
        my($column1, $column2) = split("\t", $line);

        if ($column1 eq $value) {
        push @{ $hash{$column1} }, $column2;
        }

    }

    close FILE;

    print Dumper(\%hash);

Code output

$VAR1 = {
         'AAA' => [
                    'BBB',
                    'CCC'
                  ]
        };

My question is will my current hash of array setup work best for reading column 1 in file 2 and comparing it with column 2 of file 1? Or should I approach it differently?

Hash keys are unique, so your desired structure is not possible. On the other hand [array of array, and hash of array](http://perldoc.perl.org/perldsc.html) make more sense. — mpapec, Oct 28 '14 at 06:53

score 1 · Accepted Answer · answered Oct 28 '14 at 06:50

1

Your current code overwrites the value of $hash{$column1} on each iteration. You can use push to add a new element to the array instead of overwriting by changing this line:

$hash{$column1} = [$column2];

to

push @{ $hash{$column1} }, $column2;

Note that the data structure you're creating is not a hash of hashes but a hash of arrays.

answered Oct 28 '14 at 06:50

ThisSuitIsBlackNot

23,492
9
63
110

1

Thanks @ThisSuitIsBlackNot. I was storing each value in column 2 as hash of array for the user input but I wasn't sure if this was the best approach for the 2nd step I was trying to solve or if I needed to create a hash of hash data structure. For the second step I want to take each of the values of column 2 from file 1 and find if they exist in column 1 of file 2. If they exist I would like to print column 2 of file 2. – CJ87 Oct 28 '14 at 13:57
1

@CJ87 You only mentioned one file in your question...would that be file1 or file2? In general, to check if value Y exists in a set of values X, you can store all of X in a hash and check if `exists $hash{$Y}`. You should ask a new question if you're still not sure how to do your second step, but I would do: 1) Read file1, column2 into a hash. column1 is unnecessary so just ignore it. 2) Loop through file2. If column1 exists in the hash, print column2. Nothing from file2 needs to be stored in a hash, since you can print each line as you process it. – ThisSuitIsBlackNot Oct 28 '14 at 14:22
1

Also, it seems like this particular post was an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). That's okay, sometimes they're hard to avoid, but if you always explain *what* you're trying to accomplish instead of *how* you're trying to do it, you will usually get better answers. Your explanation in your comment above is an excellent example of the *right* way to ask a question. If you explain your goal at the beginning, you will usually have to ask fewer follow-up questions. – ThisSuitIsBlackNot Oct 28 '14 at 14:25

Parse tab delimited file into hash of array

1 Answers1