I have two datasets as
DATASET1
+-------+--------------------+
| id| name|
+-------+--------------------+
|S703401| Ryan P Cassidy|
|S703401|Christopher J Mat...|
|S703401| Frank E LaSota|
|S703401| Ryan P Cassidy|
|S703401|Anthony L Locricchio|
|S703401| Jason Monte|
+-------+--------------------+
DATASET2
+-------+------+
| id| nic|
+-------+------+
|S703401| RC82|
|S703401| NA|
|S703401| FL3|
|S703401| RC82|
|S703401| NA|
|S703401|JM2080|
+-------+------+
and i want to join them on id so that i can have output as
+-------+--------------------+-----------+
| id| name| nic |
+-------+--------------------+-----------+
|S703401| Ryan P Cassidy| RC82|
|S703401|Christopher J Mat...| NA|
|S703401| Frank E LaSota| FL3|
|S703401| Ryan P Cassidy| RC82|
|S703401|Anthony L Locricchio| NA|
|S703401| Jason Monte| JM2080|
+-------+--------------------+-----------+
I am using java spark Dataset joined = dataset1.join(dataset2,"id"); but them i am getting cartesian product for all the rows like
+-------+--------------------+------+
| id | name| nic|
+-------+--------------------+------+
|S703401| Ryan P Cassidy|JM2080|
|S703401| Ryan P Cassidy| NA|
|S703401| Ryan P Cassidy| RC82|
|S703401| Ryan P Cassidy| FL3|
|S703401| Ryan P Cassidy| NA|
|S703401| Ryan P Cassidy| RC82|
|S703401|Christopher J Mat...|JM2080|
|S703401|Christopher J Mat...| NA|
|S703401|Christopher J Mat...| RC82|
|S703401|Christopher J Mat...| FL3|
|S703401|Christopher J Mat...| NA|
|S703401|Christopher J Mat...| RC82|
|S703401| Frank E LaSota|JM2080|
|S703401| Frank E LaSota| NA|
|S703401| Frank E LaSota| RC82|
|S703401| Frank E LaSota| FL3|
|S703401| Frank E LaSota| NA|
|S703401| Frank E LaSota| RC82|
|S703401| Ryan P Cassidy|JM2080|
|S703401| Ryan P Cassidy| NA|
+-------+--------------------+------+
So what am i missing here?