The data source I am working with is terrible. Some places where you would expect integers, you get "Three". In the phone number field, you may get "the phone # is xxx". Some fields are simply blank.
This is OK, as I'm parsing each field so "Three" will end up in my model as integer 3, phone numbers (and such) will be extracted via regex. Users of the service KNOW that the data is sketchy and incomplete, as it's an unfortunate fact of the way our data source is maintained and there's nothing we can do about it but step up our parsing game! As an aside, we are producing our own version of the data slowly as we parse more and more of the original data, but this poor source has to do for now.
So users select the data they wish to parse, and we do what we can, returning a partial/incorrect model. Now the final model that we want to store should be validated - there are certain fields that can't be null, certain strings must adhere to a format and so on.
The flow of the app is:
- User tells the service which data to parse.
- Service goes off and grabs the data, parses what it can and returns a partial model with whatever data it could retrieve.
- We display the data to the user, allowing them to make corrections and to fill in any mandatory fields for which no data was collected.
- This user-corrected data is to be saved, and therefore validated.
- If validation fails, show data again for user to make fixes, rinse & repeat.
What is the best way to go about having a model which starts off being potentially completely invalid or containing no data, but which needs to be validated eventually? The two ways I've thought of (and partially implemented) are:
- 2 models - a Data model, which has validations etc, and an UnconfirmedData model, which has no validations. The original data is put into an UnconfirmedData model until the user has made their corrections, at which point it it put into a Data model and validation is attempted.
- One model, with a "confirmed data" flag, with validation being performed manually rather than Rails' validation.
In practice I lean towards using 2 models, but I'm pretty new to Rails so I thought there me be a nicer way to do this, Rails has a habit of surprising me like that :)