0

I try some basic data types,

val x = Vector("John Smith", 10, "Illinois")
val x = Seq("John Smith", 10, "Illinois")
val x = Array("John Smith", 10, "Illinois")
val x = ...
val x = Seq( Vector("John Smith",10,"Illinois"), Vector("Foo",2,"Bar"))

but no one offer toDF(), even after import spark.implicits._.

My aim is to use someting as x.toDF("name","age","city").show

In the last example the toDF exists, but error "java.lang.ClassNotFoundException".


NOTES:

  • I am using Spark-shell with Spark v2.2.

  • Need generic transformation based on colunm names parametrized in toDF(names), not complex solutions as create Vector of case class Person(name: String, age: Long, city: String)

Expected result of show after toDF is

+----------+---+--------+
|      name|age|    city|
+----------+---+--------+
|John Smith| 10|Illinois|
+----------+---+--------+
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

2 Answers2

2

you should put values in tuple to create 3 columns

scala> Seq(("John Smith", "asd", "Illinois")).toDF("name","age","city").show
+----------+---+--------+
|      name|age|    city|
+----------+---+--------+
|John Smith|asd|Illinois|
+----------+---+--------+
chlebek
  • 2,431
  • 1
  • 8
  • 20
  • I never used `val x = ("Hello", "world")` with no qualifier... Now I see that it works (!), and you are teaching me that is valid and his name is *tuple*. So, perhaps the best and simplest Spark DataFrame definition is **"DF is a Seq of Tuples"** (why no Guide say it?) – Peter Krauss Oct 09 '19 at 18:09
0

The syntax you are looking for is.

val x = Array("John Smith", "10", "Illinois")
sc.parallelize(x).toDF()

the other way is,

val y = Seq("John Smith", "10", "Illinois")
Seq(y).toDF("value").show()

And this should work too.

Seq(Vector("John Smith","10","Illinois"), Vector("Foo","2","Bar")).toDF()
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137