0

getting started with spark-jobserver I learnt that data frames can be flattend like Spark flattening out dataframes but this still does not fulfill https://github.com/spark-jobserver/spark-jobserver#job-result-serialization

If this is the result I get from spark

Array([1364767200000,1.9517414004122625E15], [1380578400000,6.9480992806496976E16])

how could I map it to a fitting format? (useful serialization) How could I add additional fields?

Trying to play with an array like: Array([1,2], [3,4]) only results in an error.

Currently I get the following serialization based on Spark flattening out dataframes:

 "result": "Map(1364767200000 -> 1.9517414004122625E15, 1380578400000 -> 6.9480992806496976E16)"

which obviously is not "parsed" by the jobs-erver.

As far as I understand it the nested arrays (from collect) cannot be serialized properly. However, this map should be serializable. What is wrong?

edit

Only if I return a properly typed list the Json encoding seems to work.

  case class Student(name: String, age: Int)
List(Student("Torcuato", 27), Student("Rosalinda", 34))

The result is: "result": [["Torcuato", 27], ["Rosalinda", 34]]. Already for

  val dataFrame: DataFrame = sql.createDataFrame(sql.sparkContext.parallelize(List(Student("Torcuato", 27), Student("Rosalinda", 34))))
    dataFrame.collect

I get "result": ["[Torcuato,27]", "[Rosalinda,34]"] which is some strange kind of Json.

As far as I understand the problem I would need to parse all of my result into a custom class. How would I achieve this?

Community
  • 1
  • 1
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292

1 Answers1

0

The answer is that for now apparently only Maps of strings are supported thus, this Convert DataFrame to RDD[Map] in Scala results in clean serialization.

Community
  • 1
  • 1
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292