Is there a good way to append words in Spark?

Question

Here is an example.

DataSet - dataset.txt

1 banana kiwi orange melon

Code

scala> val table = sc.textFile("dataset.txt").map(_.split(" "))

scala> table.take(1)

res0: Array[Array[String]] = Array(Array(1, banana , kiwi , orange, melon))

scala> val pairSet = table.map{case Array(key,b,k,o,m) => (key, b+" "+k+" "+o+" "+m)}

scala> pairSet.take(1)

res1: Array[(String, String)] = Array((1, banana kiwi orange melon))

I wonder if the part that appends the values in the pairSet is efficient. Or is there a better way?

score 1 · Answer 1 · answered Oct 13 '17 at 06:18

1

you can split by first occurrence of space & create key & value from it.

val table = sc.textFile("dataset.txt").map { x =>
  val splits = x.split(" ",2)
  (splits(0), splits(1))
}

answered Oct 13 '17 at 06:18

vdep

3,541
4
28
54

Thank you for your reply! Is your method`(val splits = x.split(" ",2) (splits(0), splits(1)))` more efficient than my method`(b+" "+k+" "+o+" "+m)`? – S.Kang Oct 13 '17 at 06:25
yes, because in your case, you are splitting the remaining strings except the first occurrence unnecessarily only to again append them later. – vdep Oct 13 '17 at 06:27
Oh yes! Thank you very much for your advice! – S.Kang Oct 13 '17 at 06:29

score 1 · Answer 2 · answered Oct 13 '17 at 06:19

1

Your approach for logic will only work if the array always has same amount of data in it. You can also try this.

val table = sc.textFile("dataset.txt")
val pairedDF = table.map{ line =>
                        val array = line.split(" ", 2)
                        (array(0), array(1))
                        }

By using this there you are not restricting the array to be of fixed sized after splitting.

Hope this works fine for you.

Thanks

answered Oct 13 '17 at 06:19

Akash Sethi

2,284
1
20
40

Thank you for your reply! Is your method`(val array = line.split(" ", 2) (array(0), array(1)))` more efficient than my method`(b+" "+k+" "+o+" "+m)`? – S.Kang Oct 13 '17 at 06:25
Yes because it[loop] stop splitting the value after it gets the first space. – Akash Sethi Oct 13 '17 at 06:27
1

Oh yes! Thank you very much for your advice! – S.Kang Oct 13 '17 at 06:29

Is there a good way to append words in Spark?

2 Answers2