I'm building a diet analysis app in Rails 4.1. I have a model, FoodEntry
, which at a simple level has a quantity
value and references a Food
and a Measure
:
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure
end
However I actually have two different types of measures, standard generic measures (cups, teaspoons, grams, etc.) and measures which are specific to a food (heads of broccoli, medium-sized bananas, large cans, etc.). Sounds like a case for a polymorphic association right? e.g.
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure, polymorphic: true # Uses measure_id and measure_type columns
end
class StandardMeasure < ActiveRecord::Base
has_many :food_entries, as: :measure
end
class FoodMeasure < ActiveRecord::Base
has_many :food_entries, as: :measure
end
The thing is, the food-specific measures come from a legacy database dump. These records are uniquely identified by a combination of their food_id
and description
- they aren't supplied with a single-column primary key (description
is not unique on its own because there are multiple foods with the same measure description but different numeric data). Because I'm importing to my Rails Postgres db, I'm able to add a surrogate primary key - the auto-incrementing integer id
column that Rails expects. But I don't want to utilize this id
as a reference in my FoodEntry
model because it poses a pretty big challenge for keeping referential integrity intact when the (externally-supplied) data is updated and I have to reimport. Basically, those ids
are completely subject to change, so I'd much rather reference the food_id
and description
directly.
Luckily it's not very difficult to do this in Rails by using a scope on the association:
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure, ->(food_entry) { where(food_id: food_entry.food_id) }, primary_key: 'description', class_name: 'FoodMeasure'
# Or even: ->(food_entry) { food_entry.food.measures }, etc.
end
Which produces a perfectly acceptable query like this:
> FoodEntry.first.measure
FoodMeasure Load (15.6ms) SELECT "food_measures".* FROM "food_measures" WHERE "food_measures"."description" = $1 AND "food_measures"."food_id" = '123' LIMIT 1 [["description", "Broccoli head"]]
Note that this assumes that measure_id
is a string column in this case (because description
is a string).
In contrast the StandardMeasure
data is under my control and doesn't reference Foods
, and so it makes perfect sense to simply reference the id
column in that case.
So the crux of my issue is this: I need a way for a FoodEntry
to reference only one type of measure, as it would in the polymorphic association example I made above. However I don't know how I'd implement a polymorphic association with respect to my measure
models because as it stands:
- an associated
FoodMeasure
needs to be referenced through a scope, while aStandardMeasure
doesn't. - an associated
FoodMeasure
needs to be referenced by a string, while aStandardMeasure
is referenced by an integer (and the columns being referenced have different names).
How do I reconcile these issues?
Edit: I think I should explain why I don't want to use the autonumber id
on FoodMeasures
as my foreign key in FoodEntries
. When the data set is updated, my plan was to:
- Rename the current
food_measures
table toretired_food_measures
(or whatever). - Import the new set of data into a new
food_measures
table (with a new set of autonumber ids). - Run a join between these two tables, then delete any common records in
retired_food_measures
, so it just has the retired records.
If I'm referencing those measures by food_id
and description
, that way I get the benefit that food entries automatically refer to the new records, and therefore any updated numeric data for a given measure. And I can instruct my application to go searching in the retired_food_measures
table if a referenced measure can't be found in the new one.
This is why I think using the id
column would make things more complicated, in order to receive the same benefits I'd have to ensure that every updated record received the same id
as the old one, every new record received a new not-used-before id
, and that any retired id
is never used again.
There's also one other reason I don't want to do this: ordering. The records in the dump are ordered first by food_id
, however the measures for any given food_id
are in a non-alphabetical but nevertheless logical order I'd like to retain. The id
column can serve this purpose elegantly (because ids are assigned in row order on import), but I lose this benefit the moment the ids start getting messed around with.
So yeah I'm sure I could implement solutions to these problems, but I'm not sure it would be worth the benefit?