10

I'd like to write a utility function/module that'll provide simple wildcard/glob matching to strings. The reason I'm not using regular expressions is that the user will be the one who'll end up providing the patterns to match using some sort of configuration file. I could not find any such gem that's stable - tried joker but it had problems setting up.

The functionality I'm looking for is simple. For example, given the following patterns, here are the matches:

pattern | test-string         | match
========|=====================|====================
*hn     | john, johnny, hanna | true , false, false     # wildcard  , similar to /hn$/i
*hn*    | john, johnny, hanna | true , true , false     # like /hn/i
hn      | john, johnny, hanna | false, false, false     # /^hn$/i
*h*n*   | john, johnny, hanna | true , true , true
etc...

I'd like this to be as efficient as possible. I thought about creating regexes from the pattern strings, but that seemed rather inefficient to do at runtime. Any suggestions on this implementation? thanks.

EDIT: I'm using ruby 1.8.7

sa125
  • 28,121
  • 38
  • 111
  • 153

2 Answers2

16

I don't see why you think it would be inefficient. Predictions about these sorts of things are notoriously unreliable, you should decide that it is too slow before you go bending over backwards to find a faster way. And then you should profile it to make sure that this is where the problem lies (btw there is an average of 3-4x speed boost from switching to 1.9)

Anyway, it should be pretty easy to do this, something like:

class Globber 
  def self.parse_to_regex(str)
    escaped = Regexp.escape(str).gsub('\*','.*?')
    Regexp.new "^#{escaped}$", Regexp::IGNORECASE
  end

  def initialize(str)
    @regex = self.class.parse_to_regex str
  end

  def =~(str)
    !!(str =~ @regex)
  end
end


glob_strs = {
  '*hn'    => [['john', true, ], ['johnny', false,], ['hanna', false]],
  '*hn*'   => [['john', true, ], ['johnny', true, ], ['hanna', false]],
  'hn'     => [['john', false,], ['johnny', false,], ['hanna', false]],
  '*h*n*'  => [['john', true, ], ['johnny', true, ], ['hanna', true ]],
}

puts glob_strs.all? { |to_glob, examples|
  examples.all? do |to_match, expectation|
    result = Globber.new(to_glob) =~ to_match
    result == expectation
  end
}
# >> true
Joshua Cheek
  • 30,436
  • 16
  • 74
  • 83
  • I think in the case of `'*hn'` for example, he needs `'john is awesome'` to return true also, and with `/.*hn$/` will not match – Tudor Constantin Jun 23 '11 at 05:06
  • Doesn't seem to be the way globs work on my computer (Mac OSX Leopard) https://gist.github.com/1041942 – Joshua Cheek Jun 23 '11 at 05:10
  • I suppose wildcard is more accurate than glob for my purpose - for the case on `'*hn'` I'd like everything before and up-to the pattern to match, and nothing after; so `true` for `'john'`, `false` for `'john is ..'`. thanks – sa125 Jun 23 '11 at 05:54
  • That is congruent with this solution. – Joshua Cheek Jun 23 '11 at 05:59
1
def create_regex(pattern)
 if pattern[0,1] != '*'
    pattern = '[^\w\^]' + pattern
 end
 if pattern[-1,1] != '*'
    pattern = pattern + '[^\w$]'
 end
 return Regexp.new( pattern.gsub(/\*/, '.*?') )
end

This methoid should return your regexp

PS: it is not tested :D

Tudor Constantin
  • 26,330
  • 7
  • 49
  • 72