0

We are thinking about interleaving user-interactions with the user table, the only problem is that there are going to be tens of thousands of interactions per user, so 64MB of one key-value range will most likely seldom fit.

The CockroachDB docs state that:

If the amount of interleaved data stored for any Primary Key value of the root table is larger than a key-value range's maximum size (64MB by default), the interleaved optimizations will be diminished.

  • So, how bad will the performance drop be?
  • Should we even bother to interleave our tables?

The user and the interaction tables will be a little bigger, our current estimate will be around 5 to 10TB. There will be heavy aggregation queries that require joins between the interactions and the user table and then aggregating on some user columns.

Elasticsearch already failed us with it's awful parent-child join performance (and don't get me started on the forced filter-reordering...), so maybe someone has production experience with CockroachDB for such a scenario?

ACimander
  • 1,852
  • 13
  • 17

1 Answers1

2

There is a micro-benchmark https://github.com/cockroachdb/loadgen#interleave for measuring the performance of interleaved vs non-interleaved tables in CockroachDB. Interleaved joins in v1.1.3 are far less than optimal, but significant improvements have been made for the v2.0 release (if you'd like to test just interleaved tables and join queries as an experiment, you can build a binary from the master branch).

In the benchmark, you can mock a parent-child relationship by specifying the number of rows for --merchants and --products in the benchmark. Note: since products is interleaved into merchants, it will uniformly distribute the products rows into merchants.

Generally you should expect interleaved tables to always out-perform non-interleaved tables in CockroachDB v2.0 for parent-child join queries. As mentioned in the docs, you'd be sacrificing performance on table scans (read: queries that require a table scan) for interleaved tables.

Richard
  • 828
  • 1
  • 8
  • 28
  • Thanks for the answer. Looks like we're gonna roll with interleave then, hopefully the data won't grow too fast before 2.0 hits, but I guess we can always throw more machines on the problem :-) – ACimander Dec 19 '17 at 09:19