0

I have a directory that holds ~5000 2,400 sized .txt files.

I just want one filename from that directory; order does not matter.

The file will be processed and deleted.

This is not the scripts working directory.

The intention is:

  • to open that file,
  • read it,
  • do some stuff,
  • unlink it and then
  • loop to the next file.

My crude attempt does not check for only .txt files and also has to get all ~5000 filenames just for one filename. I am also possibly calling too many modules?

The Verify_Empty sub was intended to validate that there is a directory and there are files in it but, my attempts are failing so, here I am seeking assistance.

#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use CGI ':standard';
print CGI::header();
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
###
use vars qw(@Files $TheFile $PathToFile);
my $ListFolder = CGI::param('openthisfolder');
Get_File($ListFolder);
###
sub Get_File{
  $ListFolder = shift;
  unless (Verify_Empty($ListFolder)) {
    opendir(DIR,$ListFolder);
    @Files = grep { $_ ne '.' && $_ ne '..' } readdir(DIR);
    closedir(DIR);
    foreach(@Files){
      $TheFile = $_;
    }
    #### This is where I go off to process and unlink file (sub not here) ####
    $PathToFile = $ListFolder.'/'.$TheFile;
    OpenFileReadPrepare($PathToFile); 
    #### After unlinked, the OpenFileReadPrepare sub loops back to this script. 
  }
  else {
    print qq~No more files to process~;
    exit;
  }
  exit;
}
    ####
sub Verify_Empty {
  $ListFolder = shift;
  opendir(DIR, $ListFolder) or die "Not a directory";
  return scalar(grep { $_ ne "." && $_ ne ".." } readdir(DIR)) == 0;
  closedir(DIR);
}

Obviously I am very new at this. This method seems quite "hungry"? Seems like a lot to grab one filename and process it! Guidance would be great!

EDIT -Latest Attempt

my $dir = '..';
my @files = glob "$dir/*.txt";
for (0..$#files){
$files[$_] =~ s/\.txt$//;
}
my $PathAndFile =$files[0].'.txt';
print qq~$PathAndFile~;

This "works" but, it still gets all the filenames. None of the examples here, so far, have worked for me. I guess I will live with this for today until I figure it out. Perhaps I will revisit and see if anyone came up with anything better.

  • 1
    Define "first"? It sounds very strange to say that order does not matter when also mentioning first and last. Is this just a very complicated way of saying that you want to iterate over all the files? – TLP May 08 '13 at 15:00
  • Rather than `use vars ( ... $TheFile .. )` and `foreach(@Files){ $TheFile = $_;` I prefer to write `foreach my $TheFile (@Files){`. Has the advantage of giving the variable the smallest scope and not using `$_`. – AdrianHHH May 08 '13 at 15:32
  • The parameters to calls of `Get_File` and `Verify_Empty` are not needed as `$ListFolder` is in scope within both; their first assignments just overwrite the variable with the value it already contains. – AdrianHHH May 08 '13 at 15:32
  • First and last do not matter. I DO NOT want to iterate over all the files. Just want one file and it does not matter which one. – OldDogLearningNewPerlTricks May 08 '13 at 17:14
  • But, @Student33, the bulleted list in your question states "[then] loop to the next file". What does that phrase mean if not that all files are to be processed? – AdrianHHH May 08 '13 at 18:28
  • That is handled by another sub. I was only wanting to get one filename from directory without loading them all. Explaining the entire scenario sometimes helps. Also, my question was edited a bit so, looks like those details were deleted. – OldDogLearningNewPerlTricks May 08 '13 at 20:50
  • @Student33 You will find that explaining ALL of your "scenario" rather than part of it helps more than a little. Your current attempt which first gets all the files ending with .txt, then deletes all the .txt endings, then adds .txt ending again... is rather.... weird. Just... `my $file = glob "$dir/*.txt"`. Don't put them in an array, change the array, take the first element of the array and change it back. That's just dumb. – TLP May 08 '13 at 21:58

3 Answers3

4

You could loop using readdir inside while loop. In that way readdir won't return all files but give only one at the time,

# opendir(DIR, ...);
my $first_file = "";
while (my $file = readdir(DIR)) {

  next if $file eq "." or $file eq "..";
  $first_file = $file;
  last;
}
print "$first_file\n"; # first file in directory
mpapec
  • 50,217
  • 8
  • 67
  • 127
  • I could not get this to return any filenames. The files are not in the same directory as the script but, one dir down. – OldDogLearningNewPerlTricks May 08 '13 at 21:02
  • 1
    @Student33, you're doing something wrong. This code snippet does not specify which directory opened. Moreover, it is essentially correct. – pilcrow May 09 '13 at 03:11
  • You'll need to opendir first, ie. my $ListFolder = CGI::param('openthisfolder'); opendir(DIR, $ListFolder); – mpapec May 09 '13 at 06:27
4

You're calling readdir in list context, which returns all of the directory entries. Call it in scalar context instead:

my $file;
while( my $entry = readdir DIR ) {

    $file = $entry, last if $entry =~ /\.txt$/;        
}

if ( defined $file ) {
    print "found $file\n";
    # process....
}

Additionally, you read the directory twice; once to see if it has any entries, then to process it. You don't really need to see if the directory is empty; you get that for free during the processing loop.

Diab Jerius
  • 2,310
  • 13
  • 18
  • I could not get this to return any filenames. The files are not in the same directory as the script but, one dir down. I tried to modify to no avail. – OldDogLearningNewPerlTricks May 08 '13 at 21:02
  • @Student33, this code snippet, too, does not specify which directory to open, and is materially correct. You're doing something wrong in adapting it. – pilcrow May 09 '13 at 03:11
  • 2
    The tests for `'.'` and `'..'` equality are not needed, since you are already checking `/\.txt/` anyway. – pilcrow May 09 '13 at 03:12
2

Unless I am greatly mistaken, what you want is just to iterate over the files in a directory, and all this about "first or last" and "order does not matter" and deleting files is just confusion about how to do this.

So, let me put it in a very simple way for you, and see if that actually does what you want:

my $directory = "somedir";
for my $file (<$directory/*.txt>) {
    # do stuff with the files
}

The glob will do the same as a *nix shell would, it would list the files with the .txt extension. If you want to do further tests on the files inside the loop, that is perfectly fine.

The downside is keeping 5000 file names in memory, and also that if processing this file list takes time, there is a possibility that it conflicts with other processes that also access these files.

An alternative is to simply read the files with readdir in a while loop, such as mpapec mentioned in his answer. The benefit is that each time you read a new file name, the file will be there. Also, you won't have to keep a large list of file in memory.

TLP
  • 66,756
  • 10
  • 92
  • 149