0

I have a dataframe with two columns. Each column contains json.

cola colb
{"name":"Adam", "age": 23} {"country" : "USA"}

I wish to convert it to:

cola_name cola_age colb_country
Adam 23 USA

How do I do this?


The approach I have in mind is: In the original dataframe, If I can merge both the json to a single json object. I can then obtain the intended result

spark.read.json(df.select("merged_column").as[String])

But cant find an easy way of merging two json object to single json object in spark

Update: The contents of the json is not known pre-hand. Looking for a way to auto-detect schema

Jatin
  • 31,116
  • 15
  • 98
  • 163
  • 1
    Use `from_json` function : `df.selectExpr("from_json(cola, 'struct') as cola", "from_json(colb, 'country string') as country").selectExpr("cola.*", "country")` – blackbishop Dec 22 '21 at 19:49
  • Thanks. But what if the json contents is not known prehand. And need to autodetect it. – Jatin Dec 23 '21 at 01:37
  • You always have simple json strings? No nested structures? If so, you can use some string concatenation to merge them. – blackbishop Dec 23 '21 at 09:56

1 Answers1

1

I'm more familiar with pyspark syntax. I think this works:

import pyspark.sql.functions as f
from pyspark.sql.types import *

schema_cola = StructType([
  StructField('name', StringType(), True),
  StructField('age', IntegerType(), True)
])
schema_colb = StructType([
  StructField('country', StringType(), True)
])

df = spark.createDataFrame([('{"name":"Adam", "age": 23}', '{"country" : "USA"}')], ['cola', 'colb'])
display(df
        .withColumn('cola_struct', f.from_json(f.col('cola'), schema_cola))
        .withColumn('colb_struct', f.from_json(f.col('colb'), schema_colb))
        .select(f.col('cola_struct.*'), f.col('colb_struct.*'))
       )

The output looks like this: enter image description here

ARCrow
  • 1,360
  • 1
  • 10
  • 26
  • Thanks. But what if, we do not know the json contents prehand? – Jatin Dec 23 '21 at 03:47
  • In that case I think this question can help https://stackoverflow.com/questions/49088401/spark-from-json-with-dynamic-schema – ARCrow Dec 24 '21 at 19:58