1

Here is an example.

DataSet - dataset.txt

1 banana kiwi orange melon

Code

scala> val table = sc.textFile("dataset.txt").map(_.split(" "))

scala> table.take(1)

res0: Array[Array[String]] = Array(Array(1, banana , kiwi , orange, melon))

scala> val pairSet = table.map{case Array(key,b,k,o,m) => (key, b+" "+k+" "+o+" "+m)}

scala> pairSet.take(1)

res1: Array[(String, String)] = Array((1, banana kiwi orange melon))

I wonder if the part that appends the values in the pairSet is efficient. Or is there a better way?

S.Kang
  • 581
  • 2
  • 10
  • 28

2 Answers2

1

you can split by first occurrence of space & create key & value from it.

val table = sc.textFile("dataset.txt").map { x =>
  val splits = x.split(" ",2)
  (splits(0), splits(1))
}
vdep
  • 3,541
  • 4
  • 28
  • 54
  • Thank you for your reply! Is your method`(val splits = x.split(" ",2) (splits(0), splits(1)))` more efficient than my method`(b+" "+k+" "+o+" "+m)`? – S.Kang Oct 13 '17 at 06:25
  • yes, because in your case, you are splitting the remaining strings except the first occurrence unnecessarily only to again append them later. – vdep Oct 13 '17 at 06:27
  • Oh yes! Thank you very much for your advice! – S.Kang Oct 13 '17 at 06:29
1

Your approach for logic will only work if the array always has same amount of data in it. You can also try this.

val table = sc.textFile("dataset.txt")
val pairedDF = table.map{ line =>
                        val array = line.split(" ", 2)
                        (array(0), array(1))
                        }

By using this there you are not restricting the array to be of fixed sized after splitting.

Hope this works fine for you.

Thanks

Akash Sethi
  • 2,284
  • 1
  • 20
  • 40