13
folder_to_analyze = ARGV.first
folder_path = File.join(Dir.pwd, folder_to_analyze)

unless File.directory?(folder_path)
  puts "Error: #{folder_path} no es un folder valido."
  exit
end

def get_csv_file_paths(path)
  files = []
  Dir.glob(path + '/**/*.csv').each do |f|
    files << f
  end
  return files
end

def get_xlsx_file_path(path)
  files = []
  Dir.glob(path + '/**/*.xls').each do |f|
    files << f
  end
  return files
end

files_to_process = []
files_to_process << get_csv_file_paths(folder_path)
files_to_process << get_xlsx_file_path(folder_path)
puts files_to_process[1].length # Not what I want, I want:
# puts files_to_process.length

I'm trying to make a simple script in Ruby that allows me to call it from the command line, like ruby counter.rb mailing_list1 and it goes to the folder and counts all .csv and .xls files.

I intend to operate on each file, getting a row count, etc.

Currently the files_to_process array is actually an array of array - I don't want that. I want to have a single array of both .csv and .xls files.

Since I don't know how to yield from the Dir.glob call, I added them to an array and returned that.

How can I accomplish this using a single array?

sergserg
  • 21,716
  • 41
  • 129
  • 182

3 Answers3

52

Just stick the file extensions together into one group:

Dir[path + "/**/*.{csv,xls}"]
C.J.
  • 15,637
  • 9
  • 61
  • 77
  • 3
    IMO, this is a more rubyist answer. It's one line and legible. Though, I would have done: Dir["#{path}/**/*.{csv,xls}] clearly six of one and a half dozen of the other. – Merovex Nov 06 '16 at 13:40
14

Well, yielding is simple. Just yield.

def get_csv_file_paths(path)
  Dir.glob(path + '/**/*.csv').each do |f|
    yield f
  end
end

def get_xlsx_file_path(path)
  Dir.glob(path + '/**/*.xls').each do |f|
    yield f
  end
end

files_to_process = []
get_csv_file_paths(folder_path) {|f| files_to_process << f }
get_xlsx_file_path(folder_path) {|f| files_to_process << f }

puts files_to_process.length

Every method in ruby can be passed a block. And yield keyword sends data to that block. If the block may or may not be provided, yield is usually used with block_given?.

yield f if block_given?

Update

The code can be further simplified by passing your block directly to glob.each:

def get_csv_file_paths(path, &block)
  Dir.glob(path + '/**/*.txt').each(&block)
end

def get_xlsx_file_path(path, &block)
  Dir.glob(path + '/**/*.xls').each(&block)
end

Although this block/proc conversion is a little bit of advanced topic.

Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • @Merovex: note, however, that this post answers what I thought is the direct question ("how do I yield from this?"), rather than gift-wrapping a complete solution. – Sergio Tulentsev Nov 06 '16 at 13:47
  • Fair enough, though the "yield" part of the request is buried down in the narrative. Might have helped to mention that in the base question. ;-) – Merovex Dec 04 '16 at 20:53
2
def get_folder_paths(root_path)
  Dir.glob('**/*.csv') + Dir.glob('**/*.xls')
end

folder_path = File.join(Dir.pwd, ARGV.first || '')
raise "#{folder_path} is not a valid folder" unless File.directory?(folder_path)

puts get_folder_paths(folder_path).length

The get_folder_paths method returns an array of CSV and XLS files. Building an array of file names may not be what you really want, especially if there are a lot of them. An approach using the Enumerator returned by Dir.glob would be more appropriate in that case if you did not need the file count first.

Catnapper
  • 1,875
  • 10
  • 12