2

I have a dataframe with the following schema:

root
 |-- _id: long (nullable = true)
 |-- student_info: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |    |-- major: string (nullable = true)
 |    |-- hounour_roll: boolean (nullable = true)
 |-- school_name: string (nullable = true)

How can I get a list of columns under "student_info" only? I.e. ["firstname","lastname","major","honour_roll"]

ZygD
  • 22,092
  • 39
  • 79
  • 102
Pari
  • 45
  • 6

1 Answers1

4

All of the following return the list of struct's field names. The .columns approach looks cleanest.

df.select("student_info.*").columns
df.schema["student_info"].dataType.names
df.schema["student_info"].dataType.fieldNames()
df.select("student_info.*").schema.names
df.select("student_info.*").schema.fieldNames()
ZygD
  • 22,092
  • 39
  • 79
  • 102