spark-jobserver serialzation format

Question

getting started with spark-jobserver I learnt that data frames can be flattend like Spark flattening out dataframes but this still does not fulfill https://github.com/spark-jobserver/spark-jobserver#job-result-serialization

If this is the result I get from spark

Array([1364767200000,1.9517414004122625E15], [1380578400000,6.9480992806496976E16])

how could I map it to a fitting format? (useful serialization) How could I add additional fields?

Trying to play with an array like: Array([1,2], [3,4]) only results in an error.

Currently I get the following serialization based on Spark flattening out dataframes:

 "result": "Map(1364767200000 -> 1.9517414004122625E15, 1380578400000 -> 6.9480992806496976E16)"

which obviously is not "parsed" by the jobs-erver.

As far as I understand it the nested arrays (from collect) cannot be serialized properly. However, this map should be serializable. What is wrong?

edit

Only if I return a properly typed list the Json encoding seems to work.

  case class Student(name: String, age: Int)
List(Student("Torcuato", 27), Student("Rosalinda", 34))

The result is: "result": [["Torcuato", 27], ["Rosalinda", 34]]. Already for

  val dataFrame: DataFrame = sql.createDataFrame(sql.sparkContext.parallelize(List(Student("Torcuato", 27), Student("Rosalinda", 34))))
    dataFrame.collect

I get "result": ["[Torcuato,27]", "[Rosalinda,34]"] which is some strange kind of Json.

As far as I understand the problem I would need to parse all of my result into a custom class. How would I achieve this?

score 0 · Accepted Answer · edited May 23 '17 at 12:07

0

The answer is that for now apparently only Maps of strings are supported thus, this Convert DataFrame to RDD[Map] in Scala results in clean serialization.

edited May 23 '17 at 12:07

Community

1
1

answered Apr 14 '16 at 09:38

Georg Heiler

16,916
36
162
292

spark-jobserver serialzation format

edit

1 Answers1