0

I am working on a perl script to organize a folder we have that holds all of our order documents. The script works for the most part except for a curve ball someone threw at me the other day.

The problem is when we have an order that we have redone recently. We are land surveyors and sometimes we will do a survey and then a few years later we will do what is called a "flyby" where we will go back and "append" the order with another file either noting changes to the land or just simply say that everything is okay and nothing has changed.

Where this causes a problem for me is the new file we make / made has the same order number document number as the older document. For example we might have a documents named CF145323, that document would then have several single page PDF file named CF145323.pdf, *_1.pdf, *_2.pdf and so on.

What I am looking for is a way to modify my script to count the files it finds and determine / predict the next file number to come. So if there was a *_1.pdf through *_3.pdf. I want perl to take the mismatched file and make it a *_4.pdf. Follow me?

The other catch is the files are sometimes in different folders that do not match the first the number in the file name. That part I seem to have figured out it just the numbering I don't have worked out.

Also I am working in Windows so I cannot use any Linux commands.

Here is my script in the last state I left it:

#!/usr/bin/perl
use strict;
use warnings;

# Root folder for Order Documents
my $orders_root = "C:\\Users\\Ian\\Desktop\\Order_docs";

# Keep track of how many files are processed
my $files_counter = 0;

# Keep track of how many junk files are processed
my $junk_counter = 0;

# Store a list of folders that match the 3 number naming scheme
my @matched_folders;

# Create a place to move junk files into
if (! -e "$orders_root\\Junk") {

    system "mkdir $orders_root\\Junk";

}

# Clear the screen
system "cls";

print "Processing files, please wait...\n\n";

# Open $order_dir_root
opendir(ORDERS_ROOT, "$orders_root") or die $!;

# Collect a list of all sub folders
my @folders = readdir(ORDERS_ROOT);

# Close $order_dir_root 
closedir(ORDERS_ROOT);

# Remove the directories "." and ".." from the output
splice @folders, 0, 2;

foreach my $folder (@folders) {

    # Filter out all directories that don't match the numbering system
    if ($folder =~ / \d{3} /xm) {

        # If the folder matches the expression above, add it to the list of matched folders
        push @matched_folders, $folder;
    
        # Open each folder inside of the Order Documents root
        opendir(CURRENT_FOLDER, "$orders_root\\$folder");

        # Foreach folder opened, collect a list of files in the folder for sorting
        my @files = readdir(CURRENT_FOLDER);

        # Close the current folder
        closedir(CURRENT_FOLDER);

        # Remove the directories "." and ".." from the output
        splice @files, 0, 2;

        foreach my $file (@files) {

            # Match each file to the standard naming scheme
            if ($file =~ /^ (C[AFL]|ME) \d{3} \d{3}([_|\-] \d+)? \. pdf /xmi) {

                ++$files_counter;
            
            # If that file does not match, move it to a junk folder
            } else {
            
                ++$junk_counter;

                rename ("$orders_root\\$folder\\$file", "$orders_root\\Junk\\$file");

            } # End pdf match           

        } # End foreach $file
        
    } # End folder match

} # End foreach $folder



foreach my $folder (@matched_folders) {
    
    # Open $folder
    opendir(CURRENT_FOLDER, "$orders_root\\$folder");

    # Collect a list of all sub folders
    my @files = readdir(CURRENT_FOLDER);

    # Close $folder
    closedir(CURRENT_FOLDER);

    splice @files, 0, 2;
    
    foreach my $file (@files) {
        
        if ($file =~ /^ (?<office> (C[AFL]|ME)) (?<folder_num> \d{3}) (?<file_num> \d{3}([_|\-] \d+)?) \. (?<file_ext> pdf) /xmi) {
        
            my $office = uc($+{office});
            my $folder_num = $+{folder_num};
            my $file_num = $+{file_num};
            my $file_ext = lc($+{file_ext});
            
            # Change hyphens to a underscore
            $file_num =~ s/\-/_/;
            
            my $file_name = "$office" . "$folder_num" . "$file_num" . "\." . "$file_ext";
            my $fly_by_name = "$office" . "$folder_num" . "$file_num" . "_FB" . "\." . "$file_ext";
            
            # Check if the current file belongs in the current folder
            if ($folder != $folder_num) {

                # If the folder does not exist create the folder
                if (! -e "$orders_root\\$folder_num") {
                
                    system "mkdir $orders_root\\$folder_num";
                    
                }
                
                # Check to see if the file already exists
                if (! -e "$orders_root\\$folder_num\\$file_name") {
                
                    # Moves the file to correct place, these are mismatched files
                    rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$file_name");
                
                } else {
                
                    # Appends the file with a "_#" where # is equal to the 1+ the last file number, these files are fly bys
                    rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$fly_by_name");
                
                }
            
            # Files are in the correct place, the file name will be corrected only
            } else {
            
                rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$file_name");
            
            }
        
        } # End $file match
        
    } # End foreach $file

} # End foreach $folder



# Show statistics after processing
print "Done!\n\n";
print "$#folders folders processed\n";
print "$files_counter files processed\n";
print "$junk_counter junk files removed\n"
HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
AtomicPorkchop
  • 2,625
  • 5
  • 36
  • 55

1 Answers1

1

Your script is rather large to wade through, but I suggest a different approach.

First, and most obvious, is something like this:

my $base = "CF145323";
my $num = 1;
$num++ while -f "${base}_$num.pdf";

my $filename = "${base}_$num.pdf";
print "$filename\n";

In other words, see if the file already exists. You would have to modify this to test the various directories you hold the files in, and this won't work if there are gaps in the numbering sequence.

It might be easier to keep a record of each file and the latest generation of it. Typically that would be in a hash, using, for example, 'CF145323' as the key and the latest version number as its value. The hash can be saved and restored using the Storable module (very easy to use, and in the Perl base).

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • hmm interesting idea, this might work and there "should" not be any gaps in our numbering because the numbers are auto generated. – AtomicPorkchop Jun 29 '11 at 14:38