-1

How do I optimize equijoin? Table 1 has 500000 rows, Table 2 has 200,000 rows and I need to do equi join between them. The resulting table should have about 20,000,000 rows. It takes about 20 seconds currently to run -

ej[`key;Table1;Table2]

I tried some attributes on the 'key' column to no avail.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
  • Is it really that slow? Can you show the output of `meta Table1` and `meta Table2` please? On my average-spec machine `ej` on some random data runs in approximately 1 second resulting in 24,413,284 rows. – Igor Korkhov Mar 11 '16 at 13:31

2 Answers2

0

In case if one of your tables has primary key and join is made on this primary key, then lj should give better results than ej. I would suggest to try

    Table1 lj `key xkey Table2

Also key type is important. In case if key is char[] it is worth to cast it to symbol before joining.

Anton Dovzhenko
  • 2,399
  • 11
  • 16
  • I know left join will be much faster but I need equi join to get all matching pairs in the two tables. The keys are symbols or integers. – user5637363 Mar 13 '16 at 23:21
  • In case if none of the them has primary key I would propose to apply grouping to each of them or both and compare the results then. Like `ej['key;Table1;update 'g#key from Table2]` or ej['key;update 'g#key from Table1;update 'g#key from Table2]`. (in code "backtick" was replaced by ', please, replace back for testing) – Anton Dovzhenko Mar 14 '16 at 00:14
0

Equi-join ej is essentially the same as inner-join ij but accepts the key columns as a parameter. You could instead try ij, keying the right hand table:

Table1 ij `key xkey Table2

Which should improve performance.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36