17

In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract values by name? I can see how to do some really awkward stuff:

def foo(r: Row) = {
  val ix = (0 until r.schema.length).map( i => r.schema(i).name -> i).toMap
  val field1 = r.getString(ix("field1"))
  val field2 = r.getLong(ix("field2"))
  ...
}
dataframe.map(foo)

I figure there must be a better way - this is pretty verbose, it requires creating this extra structure, and it also requires knowing the types explicitly, which if incorrect, will produce a runtime exception rather than a compile-time error.

Ken Williams
  • 22,756
  • 10
  • 85
  • 147

2 Answers2

40

You can use "getAs" from org.apache.spark.sql.Row

r.getAs("field1")
r.getAs("field2")

Know more about getAs(java.lang.String fieldName)

Hongbin Wang
  • 1,186
  • 2
  • 14
  • 34
Kexin Nie
  • 511
  • 4
  • 5
3

This is not supported at this time in the Scala API. The closest you have is this JIRA titled "Support converting DataFrames to typed RDDs"

Justin Pihony
  • 66,056
  • 18
  • 147
  • 180