I am using Java here and my current datasets looks like this:
dataset d1 (where column c1 has both int and string type dataset):
c1, c2, c3
12, ab, a
xy, ah, ab
19, a, ad
a, b, c
dataset d2
c1, c2, c3
12, ab, a
10, ah, ab
19, a, xy
1, b, c
Now I want to join two dataset with OR condition like:
d1.col(c1).equalTo(d2.col(c1)).or(d1.col(c1).equalTo(d2.col(c3)))
I have tried the above join and it works with smaller dataset but when we do it for bigger dataset like 8billion X 1 million it runs forever. I am not sure what it happens. Any leads?
I have also tried following things:
when(condition, value1).otherwise(value2)
But it also didn't worked out. I have also tried googling it but no luck. Also seen this stack overflow post but not working for me. Conditional Join in Spark DataFrame