I am learning how use spark and scala and I am trying to write a scala spark program that receives and input of string values such as:
12 13
13 14
13 12
15 16
16 17
17 16
I initially create my pair rdd with:
val myRdd = sc.textFile(args(0)).map(line=>(line.split("\\s+"))(0),line.split("\\s+")(1))).distinct()
Now this is where I am getting stuck. In the set of values there are instances like (12,13) and (13,12). In the context of the data these two are the same instances. Simply put (a,b)=(b,a).
I need to create an RDD that has one or the other, but not both. So the result, once this is done, would look something like this:
12 13
13 14
15 16
16 17
The only way I can see it as of right now is that I need to take one tuple and compare it with the rest in the RDD to make sure it isn't the same data just swapped.