2

I have programmed a Perl script which has two input files:

  1. The first file has on each line phrase and then a value between parentheses. Here an example:

    hello all (0.5)
    hi all (0.63)
    good bye all (0.09)
    
  2. The second file has a list of rules. For example:

    hello all -> salut (0.5)
    hello all -> salut à tous (0.5)
    hi all -> salut (0.63)
    good bye all -> au revoir (0.09)
    good bye -> au revoir  (0.09)
    

The script has to read the second file and for each line it extracts the phrase before the arrow (e.g. for the 1st line: hello all) and it will check if this phrase is present in the first file (in our example here it is found).

If it is present it write the whole line hello all -> salut (0.5) to the output. So in this example the output file should be:

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> > salut (0.63)
good bye all -> au revoir (0.09)

My idea is to put all the contents of the first file into a hash table. For this here my script:

#!/usr/bin/perl

use warnings;

my $vocabFile = "file1.txt";
my %hashFR =();
open my $fh_infile, '<', $InFile or die "Can't open $InFile\n";

while ( my $Ligne = <$fh_infile> ) {
  if ( $Ligne =~ /(/ ) {
    my ($cle, $valeur) = split /(/, $Ligne;
    say $cle; 
    $h{$cle}  = $valeur;
  }     
}

My question now: how do I extract the segment of word just before the arrow and search for it in the hash table?

Thank you for your help

amon
  • 57,091
  • 2
  • 89
  • 149
Poisson
  • 1,543
  • 6
  • 23
  • 34

3 Answers3

2

You need to use strict. This would cause your program to fail when it encountered undeclared variables like $InFile (I assume you meant to use $vocabFile). I'm going to ignore those types of issues in the code you posted because you can fix them yourself once you turn on strict.

First, a couple of logic issues with your existing code. You don't seem to actually use the numbers in parentheses that you store as your hash values, but if you ever do want to use them, you should probably get rid of the trailing ):

    my ($cle, $valeur) = split /[()]/, $Ligne;

Next, strip leading and trailing whitespace before using a string as a hash key. You may think "foo" and "foo " are the same word, but Perl doesn't.

$cle =~ s/^\s+//;
$cle =~ s/\s+$//;

Now, you're already most of the way there. You clearly already know how to read in a file, how to use split, and how to use a hash. You just need to put these all together. Read in the second file:

open my $fh2, "<", "file2" or die "Can't open file2: $!";

while (<$fh2>) {
    chomp;

...get the part before the ->

    my ($left, $right) = split /->/;

...strip leading and trailing whitespace from the key

    $left =~ s/^\s+//;
    $left =~ s/\s+$//;

...and print out the whole line if the key exists in your hash

    print $_, "\n" if exists $hash{$left};

...don't forget to close the filehandle when you're done with it

close $fh2;

(although as amon points out, this is not strictly necessary, especially since we're reading and not writing. There's a nice PerlMonks thread dealing with this topic.)

ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110
  • This is an incredibly nice answer. +1 all the way! Nitpick: closing isn't *that* neccessary, and `die !?` is a syntax error ;-) you meant `use autodie` or `die "Can't open file2: $!"`. – amon Sep 20 '13 at 20:39
  • @amon Thank you, and fixed. That's what I get for using the answer box as my compiler ;) – ThisSuitIsBlackNot Sep 20 '13 at 20:46
1
#!/usr/bin/perl

use strict; use warnings;
use Data::Dumper;

open my $FILE_1, '<', shift @ARGV;
open my $FILE_2, '<', shift @ARGV;

my @file1 = <$FILE_1>;
my @file2= <$FILE_2>;

close $FILE_1;
close $FILE_2;
# Store "segments" from the first file in hash:
my %first_file_hash = map { chomp $_; my ($a) = $_ =~ /^(.*?)\s*\(/; $a => 1 } @file1;

my @result;
# Process file2 content:
foreach my $line (@file2) {
    chomp $line;
    # Retrieve "segment" from the line:
    my ($string) = $line =~ /^(.*?)\s+->/;
    # If it is present in file1, store it for future usage:
    if ($string and $first_file_hash{ $string }) {
        push @result, $line;
    }
}

open my $F, '>', 'output.txt';
print $F join("\n", @result);
close $F;

print "\nDone!\n";

Run as:

perl script.pl file1.txt file2.txt

Cheers!

robert.r
  • 31
  • 3
  • 1
    This is a nice answer, but great answers also explain what they are doing, instead of dumping code. There are some issues with your code. One, you aren't using any error handling for `open`. I suggest to `use autodie` as a remedy. Two, your code is very inefficient. Instead of `push @result, ...`, you could print out that line directly! – amon Sep 20 '13 at 20:37
  • @amon - yes obviously! It is not perfect, but it is not a "production" code too. It was just an example. My intention was to outline solution, focusing on retrieving data. – robert.r Sep 20 '13 at 20:50
  • @amon - one more thing - if anything is not clear in my code feel free to ask! I thought it is almost self-documenting ;) – robert.r Sep 20 '13 at 21:01
1

This can be done very straightforwardly by creating a hash directly from the contents of the first file, and then reading each line of the second, checking the hash to see if it should be printed.

use strict;
use warnings;
use autodie;

my %permitted = do {
  open my $fh, '<', 'f1.txt';
  map { /(.+?)\s+\(/, 1 } <$fh>;
};

open my $fh, '<', 'f2.txt';
while (<$fh>) {
  my ($phrase) = /(.+?)\s+->/;
  print if $permitted{$phrase};
}

output

hello all -> salut (0.5)
hello all -> salut à tous (0.5)
hi all -> salut (0.63)
good bye all -> au revoir (0.09)
amon
  • 57,091
  • 2
  • 89
  • 149
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Thank you for all your replie. I'am using now the Borodin's version (tahnks to him) @Borodin: How can i change tu*o make it print the result in the text file use strict; use warnings; use autodie; my $out = "result2.txt"; open outFile, ">$out" or die $!; my %permitted = do { open my $fh, '<', 'f1.txt'; map { /(.+?)\s+\(/, 1 } <$fh>; }; open my $fh, '<', 'f2.txt'; while (<$fh>) { my ($phrase) = /(.+?)\s+->/; print if $permitted{$phrase}; } close outFile; – Poisson Sep 21 '13 at 13:20