How to roll up data of one file based on one variable before joining it to other file in spark ? I need to join the two files so that there should not be any repeated key of that column . Example : Data of one file
name,country,marks,score
a,India,12,11
b,Australia,10,9
a,England,12,10
a,America,11,18
b,India,16,12
c,America,17,22
Data of second file
name2,City,ID
a,Delhi,we1
b,Bangalore,we2
a,Gurgaon,we1
a,Mumbai,we3
c,Delhi,we4
After rolling first file, it should be like
name,country,marks,score
a,India England America,12 12 11, 11 10 18
b,Australia India,10 16, 9 12
c,America,17,22
After rolling second file, it should be like
a, Delhi Gurgaon Mumbai,we1 we1 we3
b,Bangalore, we2
c,Delhi ,we4
and after rolling these files, I want to do left join, right join and other types join in Spark.