2

The data source I am working with is terrible. Some places where you would expect integers, you get "Three". In the phone number field, you may get "the phone # is xxx". Some fields are simply blank.

This is OK, as I'm parsing each field so "Three" will end up in my model as integer 3, phone numbers (and such) will be extracted via regex. Users of the service KNOW that the data is sketchy and incomplete, as it's an unfortunate fact of the way our data source is maintained and there's nothing we can do about it but step up our parsing game! As an aside, we are producing our own version of the data slowly as we parse more and more of the original data, but this poor source has to do for now.

So users select the data they wish to parse, and we do what we can, returning a partial/incorrect model. Now the final model that we want to store should be validated - there are certain fields that can't be null, certain strings must adhere to a format and so on.

The flow of the app is:

  1. User tells the service which data to parse.
  2. Service goes off and grabs the data, parses what it can and returns a partial model with whatever data it could retrieve.
  3. We display the data to the user, allowing them to make corrections and to fill in any mandatory fields for which no data was collected.
  4. This user-corrected data is to be saved, and therefore validated.
  5. If validation fails, show data again for user to make fixes, rinse & repeat.

What is the best way to go about having a model which starts off being potentially completely invalid or containing no data, but which needs to be validated eventually? The two ways I've thought of (and partially implemented) are:

  1. 2 models - a Data model, which has validations etc, and an UnconfirmedData model, which has no validations. The original data is put into an UnconfirmedData model until the user has made their corrections, at which point it it put into a Data model and validation is attempted.
  2. One model, with a "confirmed data" flag, with validation being performed manually rather than Rails' validation.

In practice I lean towards using 2 models, but I'm pretty new to Rails so I thought there me be a nicer way to do this, Rails has a habit of surprising me like that :)

Richter
  • 627
  • 6
  • 13

2 Answers2

2

Must you save your data in between requests? If so, I would use your two model format, but use Single Table Inheritance (STI) to keep things dry.

The first model, the one responsible for the parsing and the rendering and the doing-the-best-it-can, shouldn't have any validations or restrictions on saving it. It should however have the type column in the migration so you can use the inheritance goodness. If you don't know what I'm talking about, read up on the wealth of information on STI, a good place to start would be a definitive guide.

The second model would be the one you would use in the rest of the application, the strict model, the one which has all the validations. Every time a user submitted reworked and potentially valid data, your app would try and move your instance of the open model created from the params, to an instance of the second model, and see if it was valid. If it was, save it to the database, and the type attribute will change, and everything will be wonderful. If it isn't valid, save the first instance, and return the second instance to the user so the validation error messages can be used.

class ArticleData < ActiveRecord::Base
    def parse_from_url(url)
        # parses some stuff from the data source
    end
end

class Article < ArticleData
     validates_presence_of :title, :body
     validates_length_of :title, :greater_than => 20
     # ...
end

You'll need a pretty intense controller action to facilitate the above process, but it shouldn't be too difficult. In the rest of your application, make sure you run your queries on the Article model to only get back valid ones.

Hope this helps!

hornairs
  • 1,707
  • 13
  • 20
  • Thanks a lot for the clarification, I do think this is the way I'll go. The data will be moved between requests, and maybe even sessions so it will be useful to have the provisional data stored; this will probably be more important in the future than now. Thanks for the railsforum link also, very useful. – Richter Jan 08 '11 at 16:10
2

Using one model should be straightforward enough. You'll need an attribute/method to determine whether the validations should be performed. You can pass :if => to bypass/enable them:

validates_presence_of :title, :if => :should_validate

should_validate can be a simple boolean attribute that returns false when the model instance is "provisional", or a more complicated method if necessary.

zetetic
  • 47,184
  • 10
  • 111
  • 119
  • Ahh, that's the Rails goodness I was hoping for. However after implementing both this and the method outlined above my hornairs, I think I'm going to go down the 2 model route. It's not quite as slick, but allows me to better separate valid from provisional data, which may be important for the future. Thanks very much for your reply though, I hadn't seen :if before, no doubt it will come in useful very soon. – Richter Jan 08 '11 at 16:08