0

Previously, I've looked up and down the web and haven't found much information on this topic. I'm looking to implement some ruby code that will scan my hard drive for particular information (telephone numbers for example). I've already built several regexs to find what I'm looking for; however, I'm unsure what the core logic to implement them would look like. Any suggestions?

Anconia
  • 3,888
  • 6
  • 36
  • 65
  • Are you planning on doing a pure ruby implementation? In *nix environments you can leverage the existing programs/filters to accomplish this. – Candide Nov 24 '12 at 03:34
  • @Candide I mentioned ruby because it is what I am most familiar with and would ultimately like to implement this into a rails app. But I'm open to suggestions! – Anconia Nov 24 '12 at 03:46

1 Answers1

1

You should walk the filesystem, recursing for directories, and applying your routine to every file. I'd just use a combination of the find command, starting from your filesystem root, and handing your routine every file. I don't know how one would use pure ruby to do it, but I look forward to others offering their nuggets of wisdom.

Since you're using ruby, here's some (untested) code I found:

require ’rubygems’
require ’alib’ 

count = 0
alib.util.find2 ‘/’ do |entry, stat|
   count += 1 if stat.file?
end

It would appear you need the alib gem to run this, let me know how it works out.

hd1
  • 33,938
  • 5
  • 80
  • 91
  • Thanks! Do any particular languages come to mind? As I mentioned in my first comment I'd ultimately like to implement this into a rails app. – Anconia Nov 24 '12 at 03:49
  • The one you're most familiar with is the one I'd use. I am most familiar with Unix commands, followed by Java, python, perl, and so on... I'll look into crawling a filesystem in ruby. – hd1 Nov 24 '12 at 03:50
  • I keep receiving the following error: `invalid multibyte char (US-ASCII)` when I run it. – Anconia Nov 24 '12 at 17:12
  • According to http://stackoverflow.com/questions/3678172/ruby-1-9-invalid-multibyte-char-us-ascii, put '# encoding: utf-8' at the top of the file. – hd1 Nov 24 '12 at 17:35