Dir.glob to get all csv and xls files in folder

Question

folder_to_analyze = ARGV.first
folder_path = File.join(Dir.pwd, folder_to_analyze)

unless File.directory?(folder_path)
  puts "Error: #{folder_path} no es un folder valido."
  exit
end

def get_csv_file_paths(path)
  files = []
  Dir.glob(path + '/**/*.csv').each do |f|
    files << f
  end
  return files
end

def get_xlsx_file_path(path)
  files = []
  Dir.glob(path + '/**/*.xls').each do |f|
    files << f
  end
  return files
end

files_to_process = []
files_to_process << get_csv_file_paths(folder_path)
files_to_process << get_xlsx_file_path(folder_path)
puts files_to_process[1].length # Not what I want, I want:
# puts files_to_process.length

I'm trying to make a simple script in Ruby that allows me to call it from the command line, like ruby counter.rb mailing_list1 and it goes to the folder and counts all .csv and .xls files.

I intend to operate on each file, getting a row count, etc.

Currently the files_to_process array is actually an array of array - I don't want that. I want to have a single array of both .csv and .xls files.

Since I don't know how to yield from the Dir.glob call, I added them to an array and returned that.

How can I accomplish this using a single array?

score 52 · Answer 1 · answered Mar 29 '16 at 20:43

52

Just stick the file extensions together into one group:

Dir[path + "/**/*.{csv,xls}"]

answered Mar 29 '16 at 20:43

C.J.

15,637
9
61
77

3

IMO, this is a more rubyist answer. It's one line and legible. Though, I would have done: Dir["#{path}/**/*.{csv,xls}] clearly six of one and a half dozen of the other. – Merovex Nov 06 '16 at 13:40

Sergio Tulentsev · Accepted Answer · 2013-06-05T15:40:59.707

Well, yielding is simple. Just yield.

def get_csv_file_paths(path)
  Dir.glob(path + '/**/*.csv').each do |f|
    yield f
  end
end

def get_xlsx_file_path(path)
  Dir.glob(path + '/**/*.xls').each do |f|
    yield f
  end
end

files_to_process = []
get_csv_file_paths(folder_path) {|f| files_to_process << f }
get_xlsx_file_path(folder_path) {|f| files_to_process << f }

puts files_to_process.length

Every method in ruby can be passed a block. And yield keyword sends data to that block. If the block may or may not be provided, yield is usually used with block_given?.

yield f if block_given?

Update

The code can be further simplified by passing your block directly to glob.each:

def get_csv_file_paths(path, &block)
  Dir.glob(path + '/**/*.txt').each(&block)
end

def get_xlsx_file_path(path, &block)
  Dir.glob(path + '/**/*.xls').each(&block)
end

Although this block/proc conversion is a little bit of advanced topic.

@Merovex: note, however, that this post answers what I thought is the direct question ("how do I yield from this?"), rather than gift-wrapping a complete solution. — Sergio Tulentsev, Nov 06 '16 at 13:47
Fair enough, though the "yield" part of the request is buried down in the narrative. Might have helped to mention that in the base question. ;-) — Merovex, Dec 04 '16 at 20:53

score 2 · Answer 3 · answered Jun 05 '13 at 15:57

def get_folder_paths(root_path)
  Dir.glob('**/*.csv') + Dir.glob('**/*.xls')
end

folder_path = File.join(Dir.pwd, ARGV.first || '')
raise "#{folder_path} is not a valid folder" unless File.directory?(folder_path)

puts get_folder_paths(folder_path).length

The get_folder_paths method returns an array of CSV and XLS files. Building an array of file names may not be what you really want, especially if there are a lot of them. An approach using the Enumerator returned by Dir.glob would be more appropriate in that case if you did not need the file count first.

Dir.glob to get all csv and xls files in folder

3 Answers3

Update