3

WARN:router1 warning in Japan. How to do the splitting of the above line by delimiter ":" and " " in single RDD and how to create Dataframe after creating RDD with below info WARN router1 JApan

  • 1
    Does this answer your question? [Scala : How to split words using multiple delimeters](https://stackoverflow.com/questions/45758378/scala-how-to-split-words-using-multiple-delimeters) – SternK May 19 '20 at 08:09

2 Answers2

1

First split the string via Regex and create the RDD as RDD[String]. To create the dataframe you need to include its schema although because RDD is a RDD[String] you can create the Dataset directly and then transform to DataFrame:

import spark.implicits._

val str = "WARN:router1 warning in Japan"
val arr = str.split("(:|\\s)")

val rdd = spark.sparkContext.parallelize(arr)
val ds = spark.createDataset(rdd)

ds.toDF().show()

gives

+-------+
|  value|
+-------+
|   WARN|
|router1|
|warning|
|     in|
|  Japan|
+-------+
Emiliano Martinez
  • 4,073
  • 2
  • 9
  • 19
0
val data = Seq("WARN:router1 warning in Japan")
val rdd = sc.parallelize(data) // RDD of Strings
import spark.implicits._
val dataDF = rdd
             .flatMap(line => line.replace(":"," ").split(" "))
             .toDF("value") // Dataframe

dataDF.show()

output

+-------+
|  value|
+-------+
|   WARN|
|router1|
|warning|
|     in|
|  Japan|
+-------+
Chema
  • 2,748
  • 2
  • 13
  • 24