0

Title is probably not very clear so let me explain.

I want to process a in-process join (nodeJs) on 2 tables*, Session and SessionAction. (1-N)

Since these tables are rather big (millions of records both) my idea was to get slices based on an orderBy sessionId (which they both share), and sort of lock-step walk through both tables in batches.

This however proves to be awefully slow. I'm using pseudo code as follows for both the tables to get the batches:

table('x').orderBy({index:"sessionId"}.filter(row.sessionId > start && row.sessionId < y)

It seems that even though I'm essentially filtering on a attribute sessionId which has got an index, the query planner is not smart enough to see this and every query does a complete tablescan to do the orderby before filtering afterwards (or so it seems)

Of course, this is incredibly wasteful but I don't see another option. E.g.:

  1. Order after filter is not supported by Rethink.
  2. Getting a slice of the ordered table doesn't work either, since slice-enumeration (i.e.: the xth until the yth record) for lack of a better work doesn't add up between the 2 tables.

Questions:

  1. Is my approach indeed expected to be slow, due to having to do a table scan at each iteration/batch?
  2. If so, how could I design my queries to get it working faster?

*) It's too involved to do it using Rethink Reql only.

Geert-Jan
  • 18,623
  • 16
  • 75
  • 137

1 Answers1

0

filter is never indexed in RethinkDB. (In general a particular command will only use a secondary index if you pass index as one of its optional arguments.) You can write that query like this to avoid scanning over the whole table:

r.table('x').orderBy({index: 'sessionID'}).between(start, y, {index: 'sessionId'})
mlucy
  • 5,249
  • 1
  • 17
  • 21
  • Thanks. Came up with the 'inverse' couple of minutes ago! `r.table('x').).between(start, y, {index: 'sessionId'}).orderBy({index: 'sessionID'}` any thoughts on how they would compare or would the query planner treat them the same? – Geert-Jan Aug 17 '15 at 21:53
  • Usually order matters in RethinkDB, but in this particular case they should be identical. – mlucy Aug 28 '15 at 19:26