WARN:router1 warning in Japan. How to do the splitting of the above line by delimiter ":" and " " in single RDD and how to create Dataframe after creating RDD with below info WARN router1 JApan
Asked
Active
Viewed 494 times
3
-
1Does this answer your question? [Scala : How to split words using multiple delimeters](https://stackoverflow.com/questions/45758378/scala-how-to-split-words-using-multiple-delimeters) – SternK May 19 '20 at 08:09
2 Answers
1
First split the string via Regex and create the RDD as RDD[String]. To create the dataframe you need to include its schema although because RDD is a RDD[String] you can create the Dataset directly and then transform to DataFrame:
import spark.implicits._
val str = "WARN:router1 warning in Japan"
val arr = str.split("(:|\\s)")
val rdd = spark.sparkContext.parallelize(arr)
val ds = spark.createDataset(rdd)
ds.toDF().show()
gives
+-------+
| value|
+-------+
| WARN|
|router1|
|warning|
| in|
| Japan|
+-------+

Emiliano Martinez
- 4,073
- 2
- 9
- 19
0
val data = Seq("WARN:router1 warning in Japan")
val rdd = sc.parallelize(data) // RDD of Strings
import spark.implicits._
val dataDF = rdd
.flatMap(line => line.replace(":"," ").split(" "))
.toDF("value") // Dataframe
dataDF.show()
output
+-------+
| value|
+-------+
| WARN|
|router1|
|warning|
| in|
| Japan|
+-------+

Chema
- 2,748
- 2
- 13
- 24