15

I have a dataframe whose schema looks like this:

event: struct (nullable = true)
|    | event_category: string (nullable = true)
|    | event_name: string (nullable = true)
|    | properties: struct (nullable = true)
|    |    | ErrorCode: string (nullable = true)
|    |    | ErrorDescription: string (nullable = true)

I am trying to explode the struct column properties using the following code:

df_json.withColumn("event_properties", explode($"event.properties"))

But it is throwing the following exception:

cannot resolve 'explode(`event`.`properties`)' due to data type mismatch: 
input to function explode should be array or map type, 
not StructType(StructField(IDFA,StringType,true),

How to explode the column properties?

ZygD
  • 22,092
  • 39
  • 79
  • 102
shiva.n404
  • 463
  • 1
  • 7
  • 18
  • 1
    @user8371915 As that question has been marked as a duplicate of this question your close vote would now cause cyclical duplicate navigation (and isn't valid if tried now) – Nick is tired Jan 19 '18 at 15:49

3 Answers3

12

You can use explode in an array or map columns so you need to convert the properties struct to array and then apply the explode function as below

import org.apache.spark.sql.functions._
df_json.withColumn("event_properties", explode(array($"event.properties.*"))).show(false)

You should have your desired requirement

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
8

as the error message says, you can only explode array or map types, not struct type columns.

You can just do

df_json.withColumn("event_properties", $"event.properties")

This will generate a new column event_properties, which is also of struct-type

If you want to convert every element of the struct to a new column, then you cannot use withColumn, you need to do a select with a wildcard *:

df_json.select($"event.properties.*")
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
0

You may use following to flatten the struct. Explode does not work for struct as error message states.

val explodeDF = parquetDF.explode($"event") { 
case Row(properties: Seq[Row]) => properties.map{ property =>
  val errorCode = property(0).asInstanceOf[String]
  val errorDescription = property(1).asInstanceOf[String]
  Event(errorCode, errorDescription, email, salary)
 }
}.cache()
display(explodeDF)
Anurag Sharma
  • 2,409
  • 2
  • 16
  • 34