I would like to do colocated joins on two tables in SnappyData, and in order to further speedup the join, would it help if I also create indexes on the joining columns of the two tables?
More specifically, the two tables would be quite large, and it would be ideal for me if on the large scale it's pair-wise partitioned join, and in each pair of partitions, the index nested loop join can be used instead of naive nested loop join.
I wasn't able to find example or tutorial for this and any explanation or pointers would be greatly appreciated.
Thanks in advance!
Update:
The the two tables are large in terms of #rows, and the tables have very few columns (3 - 4 columns, all integer types):
`Table1(Col_A, Col_B), Table2(Col_B, Col_C)`,
and I would like to
join Table1
& Table2
on Col_B
to get result like
Table3(Col_A, Col_B, Col_C),
thus I would prefer horizontally partitioning (using row tables) the two joining tables on the joining column Col_B
, instead of using column tables. And use co-located join to reduce data shuffling.
Even after partitioning, the partitions might still be too large, thus I'm wondering if I can create indexes in each partition independently on Col_B
and use it for index join? As it seems to me that in SnappyData I can only create index on the whole column instead of for each partition independently.