My join is executing as follows:
SELECT
left.*,
right.*
FROM `/foo/bar/baz` AS left
JOIN `/foo2/bar2/baz2` AS right
ON left.something = right.something
Dataset:
/foo/bar/baz
+-----------+-------+
| something | val_1 |
+-----------+-------+
| a | 1 |
| a | 2 |
| a | 3 |
| a | 4 |
| a | 5 |
| a | 6 |
| a | ... |
| a | 10K |
| b | 1 |
| b | 2 |
| b | 3 |
+-----------+-------+
Dataset: /foo2/bar2/baz2
+-----------+-------+
| something | val_2 |
+-----------+-------+
| a | 1 |
| a | 2 |
| b | 1 |
| b | 2 |
| b | 3 |
+-----------+-------+
I am getting OOM errors on my executors and I don't want to throw more memory at the executors unnecessarily. How do I ensure this join executes successfully without burning extra resources?