2

I need to store a regular expression related to other fields in a database table with ActiveRecord.

I found the to_s method in Regexp class which states

Returns a string containing the regular expression and its options (using the (?opts:source) notation. This string can be fed back in to Regexp::new to a regular expression with the same semantics as the original. (However, Regexp#== may not return true when comparing the two, as the source of the regular expression itself may differ, as the example shows). Regexp#inspect produces a generally more readable version of rxp.

So it seems a working solution, but it will store the exp with an unusual syntax and in order to get the string to store I need to build it manually with /my-exp/.to_s. Also I may not be able to edit to regexp directly. For instance a simple regexp produces:

/foo/i.to_s # => "(?i-mx:foo)" 

The other option is to eval the field content so I might store the plain expression in the db column and then doing an eval(record.pattern) to get the actual regexp. This is working and since I'm the only one who will be responsible to manage the regexp records there should be no issues in doing that, except application bugs ;-)

Do I have other options? I'd prefer to not doing eval on db fields but on the other side I don't want to work with a syntax which I don't know.

Fabio
  • 18,856
  • 9
  • 82
  • 114

3 Answers3

5

use serialize to store your regex 'as-is'

class Foo < ActiveRecord::Base
  serialize :my_regex, Regexp
end

see the API doc to learn more about this.

m_x
  • 12,357
  • 7
  • 46
  • 60
  • I know that, but in that way I can't edit db fields directly. – Fabio Oct 25 '12 at 12:05
  • it is serialized in yaml, so you should be able to do it. You can access the uncasted db field with `attributes_before_type_cast[:my_regex]`. @apneadiving yes it does it automagically – m_x Oct 26 '12 at 10:36
  • Also, note that you can use any class you want to serialize / unserialize, so you can as well subclass `Regexp` and implement your own serialization / deserialization methods – m_x Oct 26 '12 at 11:05
  • note : details about custom serializers / deserializers can be found [here](http://stackoverflow.com/questions/2080347/activerecord-serialize-using-json-instead-of-yaml#5979949) – m_x Oct 26 '12 at 15:43
3

Not sure I understand your constraints exactly.

If you store a string in db, you could make a Regexp from it:

a = 'foo'
=> "foo" 
/#{a}/
=> /foo/
Regexp.new('dog', Regexp::EXTENDED + Regexp::MULTILINE + Regexp::IGNORECASE) 
=> /dog/mix

There are other constructors, see doc.

The very best solution to not use eval'd code is to store the regexp part in a string column and flags in a separate integer column. In this way the regexp can be built with:

record = Record.new pattern: 'foo', flags: Regexp::IGNORECASE
Regexp.new record.pattern, record.flags # => /foo/i
Fabio
  • 18,856
  • 9
  • 82
  • 114
apneadiving
  • 114,565
  • 26
  • 219
  • 213
  • This could be a nice approach to solve my issue, but it doesn't handle flags, am I right? What if need to create `/foo/i`? – Fabio Oct 25 '12 at 11:36
  • flags handle it: `Regexp.new('cat', true) #=> /cat/i` – apneadiving Oct 25 '12 at 11:37
  • another example: `Regexp.new('dog', Regexp::EXTENDED + Regexp::MULTILINE + Regexp::IGNORECASE) => /dog/mix ` – apneadiving Oct 25 '12 at 11:40
  • Yes, I saw that but in this way I can't "serialize" those informations in a simple string field. – Fabio Oct 25 '12 at 11:42
  • 1
    I think he means if he stored 'dog' in the db, that wouldn't store all of the flags. You could easily store another record for flags if you want though. I would look at rails validations for hints as to how best to do this though - see other answer. – Kenny Grant Oct 25 '12 at 11:42
  • `Regexp::EXTENDED + Regexp::MULTILINE + Regexp::IGNORECASE` is a fixnum, storable in db :) – apneadiving Oct 25 '12 at 11:46
  • 1
    `Regexp::EXTENDED => 2, Regexp::MULTILINE => 4, Regexp::IGNORECASE => 1`, would be nice checkboxes – apneadiving Oct 25 '12 at 11:48
  • @apneadiving this can be the solution. So if I store the fields as a plain number and I feed the first arg of Regexp.new with my column I can use the usual syntax and there will be no eval'd code? – Fabio Oct 25 '12 at 11:52
  • This is just as dangerous as eval, consider a regexp string as follows: User.find(1).email='xxx@xxx.com' – Kenny Grant Oct 25 '12 at 11:59
  • @KennyGrant: So what? `Regexp.new("User.find(1).email='xxx@xxx.com'") => /User.find(1).email='xxx@xxx.com'/` – apneadiving Oct 25 '12 at 12:01
  • Sorry, I meant to enclose that example in #{}, see the simple example below with foo and bar, you can assign variables inside a regexp too, maybe it would be safe if you use single quotes on the Regexp.new call though? Not sure. – Kenny Grant Oct 25 '12 at 12:09
2

You can use #{} within regular expressions to insert variables, so you could insert a carefully cleaned regexp by storing "foo" in the db under record.pattern as a string, and then evaluating it with:

/#{record.pattern}/

So, in the db, you would store:

"pattern"

in your code, you could do:

if record.other_field =~ /#{record.pattern}/
  # do something
end

This compiles the regexp from a dynamic string in the db that you can change, and allows you to use it in code. I wouldn't recommend it for security reasons though, see below:

Obviously this could be dangerous, as the regex can contain ruby code, so this is simpler, but in terms of danger, it is similar to eval:

a = "foo"
puts a
=> foo
b = "#{a = 'bar'}"
a =~ /#{b}/ 
puts a 
=> bar

You might be better to consider whether for security it is worth decomposing your regex tests into something you can map to methods which you write in the code, so you could store keys in the db for constraints, something like:

'alpha,numeric' etc.

And then have hard-coded tests which you run depending on the keys stored. Perhaps look at rails validations for hints here, although those are stored in code, it's probably the best approach (generalise your requirements, and keep the code out of the db). Even if you don't think you need security now, you might want it later, or forget about this and grant access to someone malicious.

Kenny Grant
  • 9,360
  • 2
  • 33
  • 47
  • I made this question because I'm concerned about security in the eval approach. What I was looking for is to have something like `Regexp.parse("/foo/i")` that produces `/foo/i` regexp. In this way i just need to handle errors for invalid regexp and I do not have to worry about side effects. – Fabio Oct 25 '12 at 11:45
  • As I hope the answer above makes clear, any regexp can have side effects - it can contain code, and therefore can assign variables etc, so using a regexp is not safe. If you are happy that you have other controls in place though (like only having 1 user), the code above should easily store a regexp in the db, and let you execute it, which I think is what you wanted? Handling flags is a pretty trivial problem, with you can deal with by putting another field in if you want to. I would consider another direction for this though, and trading a little flexibility for security. – Kenny Grant Oct 25 '12 at 11:50