0

I'm trying to go through a bunch of text files twice to look for two different values. However, the seek $fh, 0, 0 doesn't seem to work. Why?

Please help

My codes:

 use strict;
 use warnings;
 ...
 read_in_data_employer();
 read_in_data_union();
 process_files ($FileFolder);
 close $FileHandle;
 ...
 sub process_files
 {
         opendir (DIR, $FileFolder)
                 or die "Unable to open $FileFolder: $!";
         my @files = grep { /.pdf.txt/ } readdir (DIR);
         closedir (DIR);
         @files = map { $FileFolder . '/' . $_ } @files;
         foreach my $file (@files)
         {
                 open (my $txtfile, $file) or die "error opening $file\n";
                 print "$file";
                 LookForEmployer:
                 {
                         print $FileHandle "\t";
                         while (my $line=<$txtfile>)
                         {
                                 foreach (@InputData_Employers)
                                 {
                                         if ($line =~ /\Q$_/i)
                                         {
                                                 print $FileHandle "$_";
                                                 last LookForEmployer;
                                         }
                                 }
                         }
                 }
                 seek ($txtfile, 0, 0);
                 LookForUnion:
                 {
                         print $FileHandle "\t";
                         while (my $line=<$txtfile>)
                         {
                                 print "$.\n";
                                 foreach (@InputData_Unions)
                                 {
                                         if ($line =~ /\Q$_/i)
                                         {
                                                 print $FileHandle "$_";
                                                 last LookForUnion;
                                         }
                                 }
                         }
                 }
                 close $txtfile
         }
 }

Output:

>perl "test.pl" test "employers.txt" "unions.txt" output.txt
test/611-2643-03 (801-0741).pdf.txt12
13
14
15
16
17
18
19
20
21
22
test/611-2643-05 (801-0741).pdf.txt
7
8
9
10
11
12
test/611-2732-21 (805-0083).pdf.txt
2
3
4
5
6
7
8
test/611-2799-17 (801-0152).pdf.txt
6
7
8
9
10
11
12
13
14

Thanks

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 3
    What do you mean by "*`seek $fh, 0, 0` doesn't seem to work*"? Please explain why you think `seek` failed. Don't make us do work for nothing! – ikegami Aug 16 '18 at 13:30
  • 3
    If you wonder why seek failed, check for errors! `seek(...) or die("seek: $!\n");` – ikegami Aug 16 '18 at 13:30
  • 3
    Tip: Don't use global variables for nothing! Change `opendir(DIR, ... )` to `opendir(my $DIR, ...)` – ikegami Aug 16 '18 at 13:30
  • 2
    `$.` isn't every useful if you `seek` around. It simply counts the number of times `<>` returned. (It's not like files have an index of lines.) – ikegami Aug 16 '18 at 13:33
  • hi ikegami.I tried adding seek $txtfile, 0, 0 or die "Can't seek $txtfile: $!";; No error – Vincent Lin Aug 16 '18 at 14:30
  • You skipped over the first and more important part of my comment, which I now repeat: What makes you think `seek` isn't working? – ikegami Aug 16 '18 at 14:31
  • i thought $. gives me the line numbers and the output shows the second loop didn't start at line 1 as it should if seek works – Vincent Lin Aug 16 '18 at 14:32

1 Answers1

1

Files don't have line numbers. They don't even have lines. Files just have bytes. That means you can't just ask the system "What line of the file is at this position?"

But, since you're seeking to the start of the file, all you need is to reset $..

use Fcntl qw( SEEK_SET );

seek($txtfile, 0, SEEK_SET)
   or die("seek: $!\n");

$. = 0;

By the way, you program is insanely inefficient. Load the data into hashes or into a database!

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thanks for the help. How should I make the code more efficient? Load the entire text file into an array? – Vincent Lin Aug 16 '18 at 15:40
  • No, a hash keyed by what you are searching – ikegami Aug 16 '18 at 15:42
  • The seek doesn't work because the result is different from closing and reopen the file again close $txtfile; open ($txtfile, $file) or die "error opening $file\n"; # seek $txtfile, 0, 0 or die "Can't seek $txtfile: $!";; – Vincent Lin Aug 16 '18 at 15:46
  • That makes no sense. `seek` shouldn't do the same as opening and closing the file. It should change the file pointer to the specified position. And it did. – ikegami Aug 16 '18 at 15:48
  • You are right. In a later section of the code I wanted to read through first 150 lines for dates. For some reason the code stop at line 142 if I used "seek". I increase the number to 160 and it works now. – Vincent Lin Aug 16 '18 at 20:35
  • how do I use a hash to look for a partial match within each line of the file? The examples I see (https://stackoverflow.com/questions/33373587/string-comparison-for-hash-keys-in-perl) only do full match – Vincent Lin Aug 16 '18 at 20:39
  • Not enough info provided. Also, not the place to ask new questions. – ikegami Aug 17 '18 at 16:29