11

I'm simply trying to get a list of filenames given a path with wildcard.

my $path = "/foo/bar/*/*.txt";
my @file_list = glob($path);
foreach $current_file (@file_list) {
   print "\n- $current_file";
}

Mostly this works perfectly, but if there's a file greater than 2GB, somewhere in one of the /foo/bar/* subpaths, the glob returns an empty array without any error or warning.

If I remove the file file or add a character/bracket sequence like this:

my $path = "/foo/bar/*[0-9]/*.txt";

or

my $path = "/foo/bar/*1/*.txt";

then the glob works again.

UPDATE:

Here's an example (for a matter of business policy I had to mask the pathname):

[root]/foo/bar # ls -lrt
drwxr-xr-x    2 root     system         256 Oct 11 2006  lost+found
drwxr-xr-x    2 root     system         256 Dec 27 2007  abc***
drwxr-xr-x    2 root     system         256 Nov 12 15:32 cde***
-rw-r--r--    1 root     system  2734193149 Nov 15 05:07 archive1.tar.gz
-rw-r--r--    1 root     system     6913743 Nov 16 05:05 archive2.tar.gz
drwxr-xr-x    2 root     system         256 Nov 16 10:00 fgh***
[root]/foo/bar # /home/user/test.pl
[root]/foo/bar #

Removing the >2GB file (or globbing with "/foo/bar/[acf]/" istead of "/foo/bar//")

[root]/foo/bar # ls -lrt
drwxr-xr-x    2 root     system         256 Oct 11 2006  lost+found
drwxr-xr-x    2 root     system         256 Dec 27 2007  abc***
drwxr-xr-x    2 root     system         256 Nov 12 15:32 cde***
-rw-r--r--    1 root     system     6913743 Nov 16 05:05 archive2.tar.gz
drwxr-xr-x    2 root     system         256 Nov 16 10:00 fgh***

[root]/foo/bar # /home/user/test.pl
- /foo/bar/abc***/heapdump.phd.gz
- /foo/bar/cde***/javacore.txt.gz
- /foo/bar/fgh***/stuff.txt
[root]/foo/bar #

Any suggestion?

I'm working with: Perl 5.8.8 Aix 5.3 The filesystem is a local jfs.

roovalk
  • 111
  • 4
  • 1
    Is this the actual, entire program that gave the problem? The only reason I'm asking is that an earlier `glob` could affect what a later `glob` returns. – ikegami Nov 16 '12 at 09:16
  • The problem came out on a more complex code, but the issue is perfectly replicated by this snipplet. – roovalk Nov 16 '12 at 09:26
  • can you show as small as possible reproducible testcase with actual data files somewhere? – mvp Nov 16 '12 at 09:30
  • I have a small update on the issue. The problem seems to be strictly connected with any file greater than 2GB in one of the subpath. – roovalk Nov 16 '12 at 10:10
  • What's the output of `perl -V:uselargefiles` (capital "V") – ikegami Nov 16 '12 at 10:50
  • Can you stat() a file > 2GB? Although why glob would call stat on intermediate file escapes me at the moment. – Richard Huxton Nov 16 '12 at 11:02
  • I can stat() the file without any problem: `dev: 2555911 ino: 7 mode: 33188 nlick: 1 uid: 0 gid: 0 rdevr: 0 size: 2734193149 atime: 1353013680 mtime: 1352952423 ctimer: 1353060761 block size: 4096 blocks: 5340224` – roovalk Nov 16 '12 at 11:32
  • @roovalk, was that the Perl `stat()` command on the file, or system command? Try the Perl one if it wasn't. This smells like a bug or problem related to large file support. Your Perl version is quite old; I would definitely suggest upgrading if possible. – dan1111 Nov 16 '12 at 15:31
  • It's the mere print of: ''($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat($filename);'' Actually using using a "younger" version that 5.8.8 is not possible. This script need to run on a wide group of machines with different patching/release/version and my department cannot "force" the upgrade. – roovalk Nov 16 '12 at 16:03
  • One way to work around it would be implementing what you want done manually without `glob`, using `opendir` and `readdir` instead. Its not as nice as `glob`, but if `glob` is broken for some reason, `readdir` should be one way to avoid that. See `perldoc -f readdir` for more info w/ a good example. – Kent Fredric Nov 18 '12 at 06:38
  • Also, please state in the question the importance of using `glob` stars and soforth, if you merely need to locate `.txt` files, or you need users to specify a file glob to locate. If `glob` was just a means to an end, then we might be able to eliminate the requirement, however, if your clients/users need to provide the globs, then its a different problem. – Kent Fredric Nov 18 '12 at 06:41
  • The users need to provide the path for globs. Anyway I'm not stricly forced to use glob, but personally I don't know any other way to use a double * wildcard. The directory structure to search within can be really articulated (something like /foo/bar/*/abc/*/*.out). – roovalk Nov 20 '12 at 08:06

2 Answers2

4

In the absence of a proper answer you're going to want a work-around. I'm guessing you've hit some platform-specific bug in the glob() implementation of 5.8.8

I had a quick look at the source on CPAN but my C is too rusty to spot anything useful.

There have been lots of changes to that module though, so a bug may well have been reported and fixed. You're not even on the last release of 5.8 - there's a 5.8.9 out there which mentions updates to AIX compatibility and File::Glob.

I'd test this by installing local::lib if you haven't already and then perhaps cpanm and try updating File::Glob - see what that does. You might need to download the files by hand from e.g. here

If that solves the problem then you can either deploy updates to the required systems, or you'll have to re-implement the bits of glob() you want. Which is going to depend on how complex your patterns get.

If it doesn't solve the problem then at least you'll be able to stick some printf's into the code and see what it's doing.

Hopefully someone will post a real answer and make this redundant about 5 minutes after I click "Post Your Answer" though.

Richard Huxton
  • 21,516
  • 3
  • 39
  • 51
  • Thanks for the details concerning the Aix compatibility with Glob. Anyway I think I'll try with some different built-in function. This script need to be distributed on a wide number (> 200) of machines with slightly different environments and with a jungle of patch/apar fix/etc. – roovalk Nov 20 '12 at 08:12
-3

I've never used the new Glob function before, so i cant comment on benefits/problems, but it seems quite a lot of people have had issues using it: see => https://stackoverflow.com/search?q=perl+glob&submit=search for some questions and possible solutions.

IF you don't mind trying out something else: Here is my tried and tested 'old school' perl solution i have used in countless projects:

my $path = "/foo/bar/";
my @result_array = qx(find $path -iname '*.txt'); #run the system find command

If you - for whatever reason prefer not to run a system command from within your script, then lookup the built in Find::Perl Module instead: http://search.cpan.org/~dom/perl-5.12.5/lib/File/Find.pm

good luck

Community
  • 1
  • 1
Flow
  • 35
  • 2
  • 2
    "new" as in those new-fangled mobile radio-phones or that hop-hip music? – Richard Huxton Nov 16 '12 at 14:50
  • `File::Find` is a good recommendation; you should have led with that. Your "tried and tested" solution doesn't do the same thing the OP wanted. – dan1111 Nov 16 '12 at 15:21
  • i find the downvote demotivating. And your claim that my proposed solution "..doesnt do the same thing"? Please explain what you think my solution does - maybe i DID misunderstand what OP wanted to do.
    And while we are at it, why don't you offer a better solution instead? THAT would be more constructive, and we would all learn in the process.
    – Flow Nov 16 '12 at 15:41
  • 2
    @Flow, the OP was searching the first level of subdirectories, which your answer doesn't appear to do. Also, using a module is almost always considered better practice than running a system command, especially when it is a Perl core module. On top of that, your suggestion that glob has problems (other than people not knowing how to use it) isn't really supported by the search you linked to. I didn't downvote your post; sorry if my original comment was discouraging, but I think this needs improvement in order to be a quality answer. – dan1111 Nov 16 '12 at 15:51
  • I agree with you about the module vs system command - but in some cases, running a system command has been what worked better or faster. @Richard: haha – Flow Nov 16 '12 at 16:06
  • @Flow Thanks for the effort, but I'd prefer (and I need, for compatibility and portability) to keep the script as much "perl-pure" as possible. The path/pattern used for the glob() is taken from a configuration file and could be much more complex than "/foo/bar/*/*.txt". – roovalk Nov 16 '12 at 16:12
  • You probably mean well, but using `qx` or `system` to do a job in Perl is generally bad advice, especially for something so simple. For instance, some novices have trouble using '`open`' ... but that doesn't mean you should suggest doing `system('cat', $file)`, I mean gosh, you'd think Perl lacked any IO without calling system functions if you do that ;) – Kent Fredric Nov 18 '12 at 06:35