Title is probably not very clear so let me explain.
I want to process a in-process join (nodeJs) on 2 tables*, Session
and SessionAction
. (1-N)
Since these tables are rather big (millions of records both) my idea was to get slices based on an orderBy sessionId
(which they both share), and sort of lock-step walk through both tables in batches.
This however proves to be awefully slow. I'm using pseudo code as follows for both the tables to get the batches:
table('x').orderBy({index:"sessionId"}.filter(row.sessionId > start && row.sessionId < y)
It seems that even though I'm essentially filtering on a attribute sessionId
which has got an index, the query planner is not smart enough to see this and every query does a complete tablescan to do the orderby before filtering afterwards (or so it seems)
Of course, this is incredibly wasteful but I don't see another option. E.g.:
- Order after filter is not supported by Rethink.
- Getting a slice of the ordered table doesn't work either, since
slice-enumeration
(i.e.: the xth until the yth record) for lack of a better work doesn't add up between the 2 tables.
Questions:
- Is my approach indeed expected to be slow, due to having to do a table scan at each iteration/batch?
- If so, how could I design my queries to get it working faster?
*) It's too involved to do it using Rethink Reql only.