0

I'm trying to compare the contents of a file with a Hash of hashes. For this I am using map, if and exists, though ineffectively yet.

Basically, I want to know if columns 0 to 2 of the file exist in a hash. If so, then I want to find whether column 3 exists as key in the inner hash. My "old file.txt" is a tab separated file from which I produce the following hash:

old file.txt:

A    s.av    u
B    s.bv    u
C    s.av    u
C    s.cv    m

Hash:

my %oldhash = {
  'A' => {'s.av' => ['u']},
  'B' => {'s.bv' => ['u']},
  'C' => {'s.av' => ['u'], 's.cv' => ['m']},
};

Look if the following tab-separated columns from "new file.txt" exist in the hash:

D    Db    Dc    s.av   #cols 0 - 2 do not exist in hash
E    A     Ab    d.ef   #column 1 exists, but column 3 doesn't, so nothing is done
E    A     Ac    s.av   #col 1 and 3 exist, so the new file will have the value of $oldhash{A}{s.av}
B    Bb    B     s.bv   #col0 and 3 exist, so I'll include the value of $oldhash{B}{s.bv}

Notice that cols 0 and 2 both exist in the hash, but this is not important since I only need one of the columns to exist.

The output can be exactly as the testing file with an added column that takes u or m from the other file. Example:

D    Db    Dc    s.av       #inserts empty column
E    A     Ab    d.ef       #inserts empty column
E    A     Ac    s.av   u   #with u inserted
B    Bb    B     s.bv   u   #with u inserted

This is where I got so far, but I'm getting a exists argument is not a HASH or ARRAY element or a subroutine at myfile.pl line 24:

#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $dir='D:\\';

open my $oldfile, "<", "$dir\\old file.txt";
open my $newfile, "<", "$dir\\new file.txt";

my (%oldhash);

# creates the hash of hashes
while (my $odata = <$oldfile>){
    chomp $odata;
    my @oline = split /\t/, lc $odata;
    push @{$oldhash{$oline[1]}{$oline[2]}}, $oline[3];
}

# does the comparison between the file and the hash
while (my $newlines = <$newfile>){
    chomp $newlines;
    my @line = split /\t/, $newlines;
    if (exists map {$oldhash{$_}} @line[0..2]) {
        print $oldhash{$_};
    }
}

close $updatedtax;
close $oldtax;

I appreciate all the help that you can give me! Thank you in advance

Sos
  • 1,783
  • 2
  • 20
  • 46
  • Please show the contents of `old file.txt`. Your `%oldhash` hash structure is far from ideal for this purpose. Also please show your required output for this data set – Borodin May 02 '14 at 11:35
  • Are you aware that you are opening `old file.txt` with a file handle `$newfile`, and `new file.txt` with a file handle of `$oldfile`, and then you close `$oldtax` and `$updatedtax`? – Borodin May 02 '14 at 11:37

1 Answers1

3

exists requires a single array or hash element as its parameter. You have passed it a list of scalar values whose origin has been lost once they have gone through map.

You could write your test as

if ( grep { exists $oldhash{$_} }, @line[0..2] ) { ... }

but I think there are better ways to write a solution.

I think this does what you want, but with the data you've given it outputs just u twice. You haven't shown a required output as I requested, so is that right?

I've inverted the keys that you chose for your own %oldhash so that a case can be rejected immediately just by checking for the existence of the fourth column (s.av etc.) in the hash.

I've also added use autodie, as it's essential to check whether an open has been successful before you go ahead and use data from the file handle, and this avoids checking every case explicity.

Finally I've added chdir 'D:\\' so that you don't have to prefix the file names with the path for every open.

The output includes the final "comment" column from new_file.txt that gave rise to it. I am sure you can alter the print statement to give the output that you desire.

use strict;
use warnings;
use autodie;

use Data::Dump;

chdir 'D:\\';

open my $old_fh, '<', 'old_file.txt';

my %old_data;
while (<$old_fh>) {
  chomp;
  my @fields = split /\t/;
  $old_data{$fields[1]}{$fields[0]} = $fields[2];
  print "@fields\n";
}
close $old_fh;

open my $new_fh, '<', 'new_file.txt';

while (<$new_fh>) {

  chomp;
  my @fields = split /\t/;

  my $new = '';
  if (my $list = $old_data{$fields[3]}) {
    my @possible = grep defined, @{$list}{@fields[0,1,2]};
    $new = $possible[0] if @possible;
  }

  print join("\t", @fields[0..3], $new, $fields[4]), "\n";
}

The contents of %old_data after reading the file look like this

(
  "s.av" => { A => "u", C => "u" },
  "s.bv" => { B => "u" },
  "s.cv" => { C => "m" },
)

output

D Db  Dc  s.av    #cols 0 - 2 do not exist in hash
E A Ab  d.ef    #column 1 exists, but column 3 doesn't, so nothing is done
E A Ac  s.av  u #col 1 and 3 exist, so the new file will have the value of $oldhash{A}{s.av}
B Bb  B s.bv  u #col0 and 3 exist, so I'll include the value of $oldhash{B}{s.bv}
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Sorry, I had missed the output in my post. But I think I can already play around with your answer! Thank you so much! – Sos May 02 '14 at 12:16
  • @Sosi: I've fixed it to give the output that you request – Borodin May 02 '14 at 12:25
  • @Sosi: Altered slightly to use `grep` instead of a `for` loop for tidiness and clarity – Borodin May 02 '14 at 12:28
  • Thanks! What does `dd \%old_data;` do? – Sos May 02 '14 at 13:14
  • @Sosi: I'm sorry, that is a legacy from my debugging. It's a function provided by [`Data::Dump`](https://metacpan.org/module/Data::Dump) (which IMO is highly superior to `Data::Dumper` because its output is so much more readable). It simply dumps the hash to the current file handle, and I used it to create the dump that I included in my answer. I've removed it. – Borodin May 02 '14 at 13:23
  • thanks for your help. I'll also have a look at `Data::Dump`, good thing you forgot about it there – Sos May 02 '14 at 13:53
  • a couple of comments: 1.the final printing line should be `print join("\t", @fields[0..2], $new, $fields[3]), "\n";` (note the argument of `$fields`); 2. the output that I get is a ref to an array corresponding to `$new`. If I print instead `print join("\t", @fields[0..3], @{$new}, $fields[4]), "\n";` I get errors when $new is undefined. How could I solve this? – Sos May 02 '14 at 15:02
  • ok, finally changing `$new` to `$new = [''];` and the print to `print join("\t", @fields[0..2], @{$new}, $fields[3]), "\n";` solves these problems – Sos May 02 '14 at 15:43
  • @Sosi: You are wrong in both of your comments. My `print` statement prints the first four fields from `new_file.txt`, followed by the matching value from `old_file.txt`, followed by the last (comment) field from `new_file.txt`. If you do as you describe then you will get, for example, `E A Ac u s.av`. What do you mean by *"the output that I get is a ref to an array "*? The only output is a tab-separated text file. And `$new` is *never* undefined: it is initialised to the empty string `''`. – Borodin May 02 '14 at 16:55
  • @Sosi: Please raise a new question showing what you have done to break this solution. Perhaps on [Stack Echange's *Code Review* site](http://codereview.stackexchange.com) – Borodin May 02 '14 at 16:58
  • I was having those errors that I posted above when trying to implement your code in my example. It may have been something that was slightly different in my example and/or in the way I translated your answer to my real example. Either way, your code works for the working example that I showed, so kudos! Thank you very much! – Sos May 05 '14 at 19:34
  • One more question (and I'm sorry for extending these comments further) but why are you dereferencing `$list` into an array in `@{$list}{@fields[0,1,2]}` if `$list` is a ref to a hash? – Sos May 08 '14 at 14:52
  • That is a hash *slice*. Elements `a`, `b` and `c` of hash `%data` are `($data{a}, $data{b}, $data{c})` or `@data{qw/ a b c /}`. The '@` sigil is saying that the expresion is a *list*, whereas the `$` sigil in `$hash{a}` says that the expression is a *scalar*. Incientally, that is why you get a warning if you write `@array[1]` instead of `$array[1]`, but `@array[1,2,3]` is correct – Borodin May 08 '14 at 15:28
  • Thank you so much, you don't know how much I learnt from this thread. Thank you! – Sos May 09 '14 at 09:31