performance issues while processing 2 tables in lockstep based on orderedBy from-to

Question

Title is probably not very clear so let me explain.

I want to process a in-process join (nodeJs) on 2 tables*, Session and SessionAction. (1-N)

Since these tables are rather big (millions of records both) my idea was to get slices based on an orderBy sessionId (which they both share), and sort of lock-step walk through both tables in batches.

This however proves to be awefully slow. I'm using pseudo code as follows for both the tables to get the batches:

table('x').orderBy({index:"sessionId"}.filter(row.sessionId > start && row.sessionId < y)

It seems that even though I'm essentially filtering on a attribute sessionId which has got an index, the query planner is not smart enough to see this and every query does a complete tablescan to do the orderby before filtering afterwards (or so it seems)

Of course, this is incredibly wasteful but I don't see another option. E.g.:

Order after filter is not supported by Rethink.
Getting a slice of the ordered table doesn't work either, since slice-enumeration (i.e.: the xth until the yth record) for lack of a better work doesn't add up between the 2 tables.

Questions:

Is my approach indeed expected to be slow, due to having to do a table scan at each iteration/batch?
If so, how could I design my queries to get it working faster?

*) It's too involved to do it using Rethink Reql only.

score 0 · Accepted Answer · answered Aug 17 '15 at 20:23

0

filter is never indexed in RethinkDB. (In general a particular command will only use a secondary index if you pass index as one of its optional arguments.) You can write that query like this to avoid scanning over the whole table:

r.table('x').orderBy({index: 'sessionID'}).between(start, y, {index: 'sessionId'})

answered Aug 17 '15 at 20:23

mlucy

5,249
1
17
21

Thanks. Came up with the 'inverse' couple of minutes ago! `r.table('x').).between(start, y, {index: 'sessionId'}).orderBy({index: 'sessionID'}` any thoughts on how they would compare or would the query planner treat them the same? – Geert-Jan Aug 17 '15 at 21:53
Usually order matters in RethinkDB, but in this particular case they should be identical. – mlucy Aug 28 '15 at 19:26

performance issues while processing 2 tables in lockstep based on orderedBy from-to

1 Answers1