0

I have a spark dataframe column (custHeader) in the below format and I want to extract the value of the key - phone into a separate column. trying to use the from_json function, but it is giving me a null value.

valArr:array
   element:struct
      key:string
      value:string

   ex:    [{"key": "BusinessName", "value": "RoadWay Diner"}, {"key": "Center", "value": "Denver"}, {"key": "phone", "value": "123456789"}, {"key": "customer", "value": "alex"}, {"key": "type", "value": "restaurant"}]

I have the below code and it is giving me null value. The result I want is to extract the value for the key - phone as a separate column.

 val sch = ArrayType(StructType(Array(
                           StructField("key", StringType),
                           StructField("value", BinaryType))))

    val df02 = df01.withColumn("ext_val", from_json(col("custHeader").cast(StringType), sch))
desertnaut
  • 57,590
  • 26
  • 140
  • 166
marc
  • 319
  • 1
  • 5
  • 20

1 Answers1

0

Assuming there is only one phone per line, the following method can be used.

import pyspark.sql.functions as F

...
df = df.withColumn('ext_val', F.expr('filter(custHeader, x -> x.key = "phone")[0].value'))
过过招
  • 3,722
  • 2
  • 4
  • 11