How do I extract each words from a text file in scala

Question

I'm pretty much new to Scala. I have a text file that has only one line with file words separated by a semi-colon(;). I want to extract each word, remove the white spaces, convert all to lowercase and call them based on the index of each word. Below is how I approached it:

newListUpper2.txt contains (Bed;  chairs;spoon; CARPET;curtains )
val file = sc.textFile("myfile.txt")
val lower = file.map(x=>x.toLowerCase)
val result = lower.flatMap(x=>x.trim.split(";"))
result.collect.foreach(println)

Below is the copy of the REPL when I executed the code

    scala> val file = sc.textFile("newListUpper2.txt")
    file: org.apache.spark.rdd.RDD[String] = newListUpper2.txt MapPartitionsRDD[5] at textFile at 
    <console>:24
    scala> val lower = file.map(x=>x.toLowerCase)
    lower: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[6] at map at <console>:26
    scala> val result = lower.flatMap(x=>x.trim.split(";"))
    result: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7] at flatMap at <console>:28
    scala> result.collect.foreach(println)
bed                                                                          
 chairs
spoon
 carpet
curtains
scala> result(0)
<console>:31: error: org.apache.spark.rdd.RDD[String] does not take parameters
       result(0)

The results are not trimmed and then passing the index as parameter to get the word at that index gives error. My expected outcome should be as stated below if I pass the index of each word as parameter

result(0)= bed
result(1) = chairs
result(2) = spoon
result(3) = carpet
result(4) = curtains

What am I doing wrong?.

score 2 · Accepted Answer · answered Jan 08 '20 at 07:15

newListUpper2.txt contains (Bed;  chairs;spoon; CARPET;curtains )
val file = sc.textFile("myfile.txt")
val lower = file.map(x=>x.toLowerCase)
val result = lower.flatMap(x=>x.trim.split(";")) // x = `bed;  chairs;spoon; carpet;curtains` , x.trim does not work. trim func effective for head and tail only
result.collect.foreach(println)

Try it:

val result = lower.flatMap(x=>x.split(";").map(x=>x.trim))

score 1 · Answer 2 · answered Jan 08 '20 at 08:25

1) Issue 1

scala> result(0)
<console>:31: error: org.apache.spark.rdd.RDD[String] does not take parameters

result is a RDD and it cant take parameters in this format. Instead you can use result.show(10,false)

2) Issue 2 - To achieve like this - result(0)= bed ,result(1) = chairs.....

scala> var result = scala.io.Source.fromFile("/path/to/File").getLines().flatMap(x=>x.split(";").map(x=>x.trim)).toList
result: List[String] = List(Bed, chairs, spoon, CARPET, curtains)

scala> result(0)
res21: String = Bed

scala> result(1)
res22: String = chairs

How do I extract each words from a text file in scala

2 Answers2