2

I'm building a diet analysis app in Rails 4.1. I have a model, FoodEntry, which at a simple level has a quantity value and references a Food and a Measure:

class FoodEntry < ActiveRecord::Base
  belongs_to :food
  belongs_to :measure
end

However I actually have two different types of measures, standard generic measures (cups, teaspoons, grams, etc.) and measures which are specific to a food (heads of broccoli, medium-sized bananas, large cans, etc.). Sounds like a case for a polymorphic association right? e.g.

class FoodEntry < ActiveRecord::Base
  belongs_to :food
  belongs_to :measure, polymorphic: true # Uses measure_id and measure_type columns
end

class StandardMeasure < ActiveRecord::Base
  has_many :food_entries, as: :measure
end

class FoodMeasure < ActiveRecord::Base
  has_many :food_entries, as: :measure
end

The thing is, the food-specific measures come from a legacy database dump. These records are uniquely identified by a combination of their food_id and description - they aren't supplied with a single-column primary key (description is not unique on its own because there are multiple foods with the same measure description but different numeric data). Because I'm importing to my Rails Postgres db, I'm able to add a surrogate primary key - the auto-incrementing integer id column that Rails expects. But I don't want to utilize this id as a reference in my FoodEntry model because it poses a pretty big challenge for keeping referential integrity intact when the (externally-supplied) data is updated and I have to reimport. Basically, those ids are completely subject to change, so I'd much rather reference the food_id and description directly.

Luckily it's not very difficult to do this in Rails by using a scope on the association:

class FoodEntry < ActiveRecord::Base
  belongs_to :food
  belongs_to :measure, ->(food_entry) { where(food_id: food_entry.food_id) }, primary_key: 'description', class_name: 'FoodMeasure'
  # Or even:           ->(food_entry) { food_entry.food.measures }, etc.
end

Which produces a perfectly acceptable query like this:

> FoodEntry.first.measure
FoodMeasure Load (15.6ms)  SELECT  "food_measures".* FROM "food_measures"  WHERE "food_measures"."description" = $1 AND "food_measures"."food_id" = '123' LIMIT 1  [["description", "Broccoli head"]]

Note that this assumes that measure_id is a string column in this case (because description is a string).

In contrast the StandardMeasure data is under my control and doesn't reference Foods, and so it makes perfect sense to simply reference the id column in that case.

So the crux of my issue is this: I need a way for a FoodEntry to reference only one type of measure, as it would in the polymorphic association example I made above. However I don't know how I'd implement a polymorphic association with respect to my measure models because as it stands:

  • an associated FoodMeasure needs to be referenced through a scope, while a StandardMeasure doesn't.
  • an associated FoodMeasure needs to be referenced by a string, while a StandardMeasure is referenced by an integer (and the columns being referenced have different names).

How do I reconcile these issues?


Edit: I think I should explain why I don't want to use the autonumber id on FoodMeasures as my foreign key in FoodEntries. When the data set is updated, my plan was to:

  1. Rename the current food_measures table to retired_food_measures (or whatever).
  2. Import the new set of data into a new food_measures table (with a new set of autonumber ids).
  3. Run a join between these two tables, then delete any common records in retired_food_measures, so it just has the retired records.

If I'm referencing those measures by food_id and description, that way I get the benefit that food entries automatically refer to the new records, and therefore any updated numeric data for a given measure. And I can instruct my application to go searching in the retired_food_measures table if a referenced measure can't be found in the new one.

This is why I think using the id column would make things more complicated, in order to receive the same benefits I'd have to ensure that every updated record received the same id as the old one, every new record received a new not-used-before id, and that any retired id is never used again.

There's also one other reason I don't want to do this: ordering. The records in the dump are ordered first by food_id, however the measures for any given food_id are in a non-alphabetical but nevertheless logical order I'd like to retain. The id column can serve this purpose elegantly (because ids are assigned in row order on import), but I lose this benefit the moment the ids start getting messed around with.

So yeah I'm sure I could implement solutions to these problems, but I'm not sure it would be worth the benefit?

Inkling
  • 3,544
  • 4
  • 30
  • 44
  • See also: [What is the best design for a database table that can be owned by two different resources, and therefore needs two different foreign keys?](http://stackoverflow.com/a/13317463/533120) – Branko Dimitrijevic Jul 03 '14 at 10:27

1 Answers1

0

it poses a pretty big challenge for keeping referential integrity intact when the (externally-supplied) data is updated

This is an illusion. You have total control over the surrogates. You can do exactly the processing of external updates whether they are there or not.

This is just one of those times when you want your own new names for things, in this case Measures, of which FoodMeasures and StandardMeasures are subtypes. Have a measure_id in all three models/tables. You can find many idioms for simplifying subtype constraints, eg using type tags.

If you process external updates in such a way that it is convenient for such objects to also have such surrogates then you need to clearly separate such PutativeFoodMeasures and FoodMeasures as subtypes of some supertype PutativeOrProvenFoodMeasure and/or of PutativeOrProvenMeasure.

EDIT:

Your update helps. You have described what I did. It is not difficult to map old to new ids; join on old & new (food_id,description) and select old id (not a food_id!). You control ids; how can it matter to reuse ids compared to them not even existing otherwise? Ditto for sorting FoodMeasures; do as you would have. It is only when you mix them with StandardMeasures giving some result that you need order a mixture differently; but you would do that anyway whether or not a shared id existed. (Though "polymorphic:" may not be the best id sharing design.)

The Measures model offers measures; and when you know you have a FoodMeasure or StandardMeasure you can get at its subtype-particular parts.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • I'm having a hard time interpreting your answer. Could you give an example of how you would do it? – Inkling Jul 04 '14 at 00:13
  • Have you read Branko's comment's link? Re Measures: Your Measures-FoodMeasures-StandardMeasures is his notifiable-users-groups. – philipxy Jul 04 '14 at 00:32