0

I have a task to find out all the PDF files under several price list folders using JRuby on Windows 7. The folder structure is as follows:

WorkSpace/Data/2015/city1/A/...
WorkSpace/Data/2015/city1/B/...
WorkSpace/Data/2015/city1/Pricelist/...
WorkSpace/Data/2015/city1/...
WorkSpace/Data/2015/city1/Price List/.....
WorkSpace/Data/2015/city2/A/...
WorkSpace/Data/2015/city2/C/...
WorkSpace/Data/2015/city2/Pricelist/...
WorkSpace/Data/2015/city2/D/...
WorkSpace/Data/2015/city2/Price List/.....

WorkSpace/Data/2016/city1/folder1/...
WorkSpace/Data/2016/city1/folder2/...
WorkSpace/Data/2016/city1/Pricelist/...
WorkSpace/Data/2016/city1/folder3/...
WorkSpace/Data/2016/city1/folder4/Price List/...
WorkSpace/Data/2016/city2/folder1/...
WorkSpace/Data/2016/city2/folder2/...
WorkSpace/Data/2016/city2/Pricelist/...
WorkSpace/Data/2016/city2/folder3/...
WorkSpace/Data/2016/city2/folder4/Price List/...

... represents all kinds of files under their corresponding folder.

I only want to find the PDF files under folder Pricelist and Price List. How can I do this?

I read Searching a folder and all of its subfolders for files of a certain type. This is an answer which I think is helpful, but how can I modify the expression /.*\.pdf$/ to achieve my goal?

Community
  • 1
  • 1
Leo
  • 303
  • 1
  • 4
  • 13

2 Answers2

2

Use a Recursive Glob

All you need to find your files is Dir#glob and Enumerable#grep. For example:

Dir.glob('WorkSpace/Data/**/*.pdf').grep /Price List|Pricelist/

This will collect all the PDF files using a recursive glob pattern that descends into all subdirectories starting at Workspace/Data (adjust the path to this starting directory as needed), and then returns only the results that match the directories you're grepping for. In this case, we're using a regular expression pattern with alternation to find either of the two directories you're looking for, without regard to how deeply nested the desired directories might be.

There may be more efficient ways to do this, or you may need to tweak the regex if it's too permissive for you, but this certainly solves the problem without needing to know much more than the root of the directory tree you want to search.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • I know the logic of your code, but I don't know why there is no output in my program. Still thanks a lot ! – Leo Feb 26 '16 at 05:37
  • @Leo If you're still having problems (e.g. getting `[]` as a result), it's probably because: 1) `Workspace/Data/` isn't under your current working directory. In that case, use a fully-qualified path, or change directory to the top of the search tree first; or 2) Your PDF files or directory names are mixed case, in which case you'll need to adjust your glob and regex to be case-insensitive. The solution was testing with JRuby 9.0.5.0 on my system, and works as described. Glad it's at least pointing you in the right direction. – Todd A. Jacobs Feb 26 '16 at 13:29
1

You'll probably want to look at the Find module. The code would be something like this:

results = []
directory_list = []

Find.find('Workspace/Data') do |path|
    if FileTest.directory?(path)
        fn = File.basename(path)
        if fn == 'Pricelist' || fn == 'Price List'
            directory_list << path
            Find.prune
        end
    end
end

directory_list.each do |starting_path|
    Find.find(starting_path) do |path|
        if File.extname(path) == '.pdf'
            results << path
        end
    end
end

The first loop scans and finds all the directories that match the directory name condition, skipping scanning below them because that will happen in the second loop. The second loop takes each of the directories found by the first loop and scans them for files ending in the '.pdf' extension, adding each one to the results list.

You can hoist the second loop's body up into the first loop in place of directory_list << path, but the resulting code would be harder to read and wouldn't gain any performance improvement.

Todd Knarr
  • 1,255
  • 1
  • 8
  • 14
  • Sorry, I gave a wrong file structure, I will edit my question and could you help me give anther answer? thks! – Leo Feb 26 '16 at 03:19
  • There is an "NoMethodError: undefined method `append` for []:Array" error when executing the line directory_list.append(path) – Leo Feb 26 '16 at 04:02
  • Sorry, I'm switching between Python and Ruby too much. In Ruby array append is the `<<` operator, not the `append()` method. – Todd Knarr Feb 26 '16 at 04:17
  • And also, it should be '.pdf' – Leo Feb 26 '16 at 04:25