0

I'm uploading a tarball through a webpage, dropping it in to /tmp/ then asking this script (which will be invoked via crontab) to:

1.) extract it

2.) build a list of all of the files (only files and recursively) in the directory

3.) search each file for a string and print that filename and line with matched string to a file.

Everything is working up to the part where I want to build a list of files in the (extracted tarball) directory. If I don't put a "!" in front of the regex on line 6 in my code (matching only files that are .bak, .conf, .cfg), then I only get a dozen files in @filelist (as I'd expect, printed by the code on line 13).

However, if I put a "!" in front of my regex on line 6 (intended to match everything but those files), line 13 will print all filenames, including files with .bak, .conf, and .cfg extensions.

How can I get a collection of filenames in the (extracted tarball) directory except for those that I'm just not concerned about?

This is my code, roughly (stripped down, untested.) I'm a perl newb so I apologize for the ugliness of what I have here but it is what it is.

 1    sub loadFiles {
 2        my $dir=shift;
 3        find(\&recurDir,"$dir");
 4    }
 5    sub recurDir {
 6        if ( $File::Find::name =~ /(\.bak|\.conf|\.cfg)$/i ) {
 7            push @filelist, $File::Find::name;
 8        }
 9        print "$File::Find::name\n";
10    }
11    sub searcher {
12        my $file=$_;
13        print "Searching $file\n";
14    }
15    my $tarball = '/tmp/mytarball.tar.gz';
16    my $ae = Archive::Extract->new( archive=>$tarball ) || die ("$!");
17    $ae->extract( to=>$UPLOAD_DIR ) || die ("$ae->error");
18    my $dir_loc = File::Spec->catfile( $UPLOAD_DIR, $ae->files->[0]);
19    loadFiles("$dir_loc");
20    find(\&searcher, @filelist);
harperville
  • 6,921
  • 8
  • 28
  • 36

2 Answers2

1

You're adding a directory to @filelist at line 7, then you print all the files in that directory and its subdirectories at line 13.

Line 6 should be:

if ( -f $File::Find::name && $File::Find::name !~ /\.(?:bak|conf|cfg)\z/i ) {

Line 13 should be:

searcher($_) for @filelist;

searcher should be:

sub searcher {
   my ($file) = @_;
   print "Searching $file\n";
}

Avoiding global vars, the whole looks like:

sub loadFiles {
    my $dir=shift;

    my @filelist;
    my $wanted = sub {
        return if $File::Find::name =~ /\.(?:bak|conf|cfg)\z/i;
        return if !-f $File::Find::name;
        push @filelist, $File::Find::name;
    };

    find($wanted, $dir);
    return @filelist;
}

sub searcher {
    my $file=shift;
    print "Searching $file\n";
}

searcher($_) for loadFiles($dir_loc);

(Technically, you could do searcher($File::Find::name); directly instead of pushing it to an array then later looping over the array.)

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • I'm confused about where `searcher($_) for @filelist;` When I put it at Line 20, I get "Searching " printed to the screen, which is a blank value for $file. I like the rewrite for line 6, though. That's pretty. – harperville Jan 09 '13 at 21:09
  • Did you change `searcher` as mentioned? btw, always use `use strict; use warnings;`! – ikegami Jan 09 '13 at 21:10
  • Other than your ") {" typo, the above code gave me the list of files I'd expect to see. In the first code, you have `my ($file)=@_;` but in your revised snippet, you have `my $file=shift'`. What do the parens do in one scenario versus the other? Thanks for the help! – harperville Jan 09 '13 at 21:23
  • Because it's `@_` on the RHS of one and `shift` on the RHS of the other. `shift` is a premature optimisation that reduces readability, so I don't use it. I used it in the final code because it's what you'd use. – ikegami Jan 09 '13 at 21:43
  • I see the difference by don't know why the difference...why the parens with `@_` and not with `shift`? – harperville Jan 09 '13 at 21:49
  • Parens cause the list assignment operator to be used. The list assignment operator evaluates its operands in list context. The scalar assignment operator, on the hand, evaluates its operands in scalar context. That would cause `@_` to return something undesireable. – ikegami Jan 10 '13 at 00:07
0
my $tarball = '/tmp/mytarball.tar.gz';
my $ae = Archive::Extract->new( archive=>$tarball ) || die ("$!");
my @files;
$ae->extract( to=>$UPLOAD_DIR ) || die ("$ae->error");
for my $file (@{$ae->files}) {
  push @files, $file if $file =~ /(\.bak|\.conf|\.cfg)$/i );
}
alex
  • 1,304
  • 12
  • 15
  • This looks so nice and tidy but it didn't work for me. I implemented the code, adding `newSearcher($_) for @files;` where newSearcher is basically an assignment using `shift`, then a `print`. When it does work, I have the full path to the file. Your code is only giving me the file name, not the full path. – harperville Jan 09 '13 at 21:34
  • It should be a relative path in archive. – alex Jan 09 '13 at 22:25
  • http://search.cpan.org/~bingos/Archive-Extract-0.60/lib/Archive/Extract.pm#$ae->files – alex Jan 09 '13 at 22:25
  • This is an array ref with the paths of all the files in the archive, relative to the to argument you specified. To get the full path to an extracted file, you would use: File::Spec->catfile( $to, $ae->files->[0] ); – alex Jan 09 '13 at 22:26