7

I've the following self referential association:

class Action < ActiveRecord::Base
  # self referential association
  has_many :action_parents
  has_many :parents, through: :action_parents
  has_many :action_children, class_name: 'ActionParent', foreign_key: 'parent_id'
  has_many :children, through: :action_children, source: :action
  …
  def should_finish
    should_start + duration
  end

  def should_start
    # my_start is a field in db: if there are no parents (root) it will use this field
    return my_start if parents.empty?
    parents.map(&:should_finish).sort.last
  end
end

My problem is the fact that should_finish and should_start are calling each other and even if I preload the parents it keeps resulting in many queries:

Action.includes(:parents).last.should_finish
# a new query every time it checks for parents

Any idea on how to cache actions and parents?

EDIT - let me give some context:

# actions table:        actions_parents table:
# id | duration         task_id | parent_id
# 1  | 5                2       | 1
# 2  | 10               3       | 1
# 3  | 20               4       | 2
# 4  | 15               4       | 3
#
#                      |--------------|
#                      | action 2     |
#         |---------- >| duration: 10 |
#         |            |--------------|
#         |                     |
#  |--------------|             |--------->|--------------|
#  | action 1     |                        | action 4     |
#  | duration: 5  |                        | duration: 15 |
#  |--------------|             |--------->|--------------|
#         |                     |
#         |            |--------------|
#         |----------->| action 3     |
#                      | duration: 20 |
#                      |--------------|

PS: there are no circular dependencies.

Assuming I have a tree my_start field of some day at 10:00:00:

# action | should_start | should_finish
# -------------------------------------
# 1      | 10:00:00*    | 10:00:05
# 2      | 10:00:05     | 10:00:15
# 3      | 10:00:05     | 10:00:25
# 4      | 10:00:25**   | 10:00:40
#
# * value from db since there is no parent
# ** should_finish of parent with latest should_finish (action 3)

I thought it could preload all actions using Action.includes(:parents)

gabrielhilal
  • 10,660
  • 6
  • 54
  • 81
  • I love this question :D good luck :D Did you try do de-recursify this? What is the logic behind `should_start` & `should_finish`? Are they timestamps? – bbozo Feb 03 '16 at 15:12
  • Also, do parents and children form a tree or a graph with cycles? (i.e. can the child be it's grandfather or something in those lines) – bbozo Feb 03 '16 at 15:18
  • How often do `should_start`, `duration` and `should_finish` values change? Is it possible to pluck those values into a column and assume they will not change? – bbozo Feb 03 '16 at 15:41
  • @bbozo: I've edited my question to give some context. If you change the start time of a tree or duration of any task all those values will change, so for that reason I don't want to store them. Also there are no circular dependencies... – gabrielhilal Feb 03 '16 at 16:41
  • @gabrielhilal Can you shed some light on the RDBMS you are running? (PostgreSQL, MySQL, etc) – PinnyM Feb 03 '16 at 19:07
  • @gabrielhilal Then you're in luck, you can use the `WITH RECURSIVE` operation :) See below for more info - here's a gem that may help you as well: https://github.com/take-five/activerecord-hierarchical_query. – PinnyM Feb 03 '16 at 19:10
  • @PinnyM: the problem with those gems is the fact they allow you to have only one parent per object.... in my case the same action might have many parents and many children (for that reason I have the action_parents table)... I think I will take a look at `WITH RECURSIVE` operation – gabrielhilal Feb 03 '16 at 19:13
  • @gabrielhilal It appears that this gem does support the 2-table approach you have taken, although the documentation doesn't make this very clear. However, you can skip the gem entirely and write it in raw SQL, passing that to the Arel [`.from()`](https://github.com/rails/arel/blob/master/lib/arel/table.rb#L35) helper. Either approach should be viable. – PinnyM Feb 03 '16 at 19:18

3 Answers3

1

I'll throw a wild one before I know the specifics,

Assuming there are no prominent loops in the parents structure you can not help yourself by caching anything short of caching the entire table because every time you hit parents you will hit different parents for every action instance and no caching strategy, including the rails one, will rescue you short of moving the entire dataset into the cache.

Thing is, what you seem to be trying to do is really hard to do with a relational database and seems exactly the reason why graph databases were invented (see What are graph databases & When to use a graph database & Neo4j on Heroku)

Best thing you can do short of going to a graph database or caching your entire actions table is to optimize the queries (use pluck) and possibly rewrite them to a PLSQL function.

Plan B is to have your knowledge about your data come to your rescue,

  • do the values in should_start, duration and should_finish change? Does it change a lot?
  • is the data real-time critical? (i.e. s it OK to receive slightly outdated information every now and then)
  • does the way you structure data need to be more read-friendly or write-friendly?
  • leading to the question: does it make sense to make them database fields of Action model so you don't have to traverse every time you lookup?
    • i.e. do you do read operations much more than writes and
    • you can recalculate the calculated fields in a background job
  • are you accessing should_start and should_finish very often in a small time window?
  • how good are you with Neo4j :D
  • ....

EDIT 1

Only solution I see at the moment is to un-recursify the problem. Try this:

store in a string/text field the ids of the parent structure, for example

  • action 4 would have [1,2,3],
  • actions 2 & 3 would have [1] and
  • action 1 would have []

then when you map the ancestor_ids array into a hash of id => action

def ancestry_hash
  @ancestry_hash ||= Hash[ Action.includes(:action_parents).where(id: ancestor_ids).map{|action| [action.id, action]} ]
end

and then reimplement the recursive query to traverse this hash and not the activerecord tree or you will trigger additional queries. Something like:

def should_finish(id = self.id)
  should_start(id) + ancestry_hash[id].duration
end

def should_start(id = self.id)
  # my_start is a field in db: if there are no parents (root) it will use this field
  action = ancestry_hash[id]
  return my_start if action.action_parents.empty?
  action.action_parents.pluck(:parent_id).map{ |parent_id| should_finish(parent_id) }.sort.last
end

I didn't test the code, but I hope you get the idea, it should be close enough to this

bbozo
  • 7,075
  • 3
  • 30
  • 56
  • please note my edits... I don't want to store those fields in DB because I would have to update them (rebuild the tree) every time I change the tree start time or duration of an action. Unfortunately the real-time is critical... TBH I don't think it's a complex setup for a relational db, I just want to cache the actions and action_parents, so not sure why includes are not working in that case... – gabrielhilal Feb 03 '16 at 16:58
  • I don't think you can make `#includes` work for you, it will only join one level of parent. I would suggest de-recursifying the problem as far as AR is concerned, check out the edit. The trick is to fetch the records in one go and use plain old ruby to run around the array/hash and not AR – bbozo Feb 03 '16 at 18:11
  • @gabrielhilal Any luck? :) – bbozo Feb 05 '16 at 18:28
1

The problem:

In a nutshell, you have 2 issues. One is that you are actually attempting to preload more than you need (?!?), and the other is that Rails is not eager loading what you actually need because of the recursive nature of the logic.

To explain a bit further, consider this:

my_action.parents.map(&:parents).flatten.map(&:parents)

Rails will:

  1. first grab all parents for the given action
  2. then loop over each of those parents and grab their parents
  3. then flatten those 'grandparents' to an array, loop over each of them and fetch their parents

Note that in this case there isn't much point in eager loading the first-level parents since you are only starting with an instance of an Action - not a collection. Calling .parents will fetch all first-level parents for that action in one pass (which is what eager loading would have done anyhow).

So what happens when you are starting with a collection (ActiveRelation) instead of an instance?

Action.some_scope.includes(:parents).map(&:parents)

In this case, the parents of ALL actions included in the scope will be eager loaded. Calling .map(&:parents) will NOT trigger any further SQL calls, and this is the whole point of eager loading with includes(). However, there are 2 things that sort of defeat the whole purpose of this - and you are doing both of them :/

Firstly, your starting point isn't actually a collection of actions, as you are immediately calling .last. As such the fetching of all parents for ALL actions is meaningless - you only needed the 'last' one! Because of this, Rails is smart enough to scope down, and will only eager load the parents of the 'last' action. However, there wasn't much benefit to eager loading in this case, since calling .parents would have resulted in the same single query at a later time. (While there is a small benefit to loading in advance if later operations need to occur faster, that has limited utility in this case). So with or without the .includes statement, you would only have performed a single query to fetch parents for the 'last' action.

More importantly, you are recursively calling .parents on each of these parents, and Rails had absolutely no idea that you were going to do this. Moreover, recursive calls are by nature not pre-fetchable (without knowing a few tricks), so there really isn't any way to tell ActiveRecord, or using 'vanilla' SQL for that matter, to walk the chain and figure out which parents are needed until after it's already done that (making the point moot). All this leads to a nightmare of an N+1 situation, as you are experiencing.

Some solutions:

There are several ways to mitigate or eliminate the N+1 problem - in order of implementation complexity:

  • Prefetch up to N levels of parents (assumes you know what max(N) is)

Action.last.parents.includes(parents: {parents: :parents}) # grabs up to 4 levels

  • Skip SQL entirely, load all actions into a hash of Action arrays keyed by associated child_id and use non-ActiveRecord methods to aggregate what you need using simple Ruby. This will deteriorate quickly as your data grows, but it might be good enough for you - at least for the moment.

  • Use a schema that would let you determine the ancestry tree in advance, and use that utility to help with this calculation. The example given by @bbozo is one way to do this - you can also explore such gems as ancestry, acts_as_tree, awesome_nested_set, closure_tree and possibly others to help you out with this.

  • Use a database specific function that actually does the recursive calculation in a single call. PostgreSQL, Oracle, and MS-SQL have this capability, while MySQL and SQLite do not. This will probably get you the best performance, but can be complex to write using just the ActiveRecord query interface.

PinnyM
  • 35,165
  • 3
  • 73
  • 81
0

Have you tried to memoize this?

In model

def should_start
  return my_start if parents.empty?

  @should_start ||= parents.map(&:should_finish).sort.last
end
Austio
  • 5,939
  • 20
  • 34
  • I've tried to memorize it, but `@should_start` is available only within the instance, so if I have 1000 actions... that's a lot of queries... I don't understand why rails is not using the loaded collection. – gabrielhilal Jan 26 '16 at 17:28
  • You should look at the map, that is going to do another query to parents as well. Sounds like if you put this in an external object to mediate the exchange you could memoize it properly and do what you are wanting. – Austio Jan 26 '16 at 20:15