Pyspark Edit Schema (json column)

Question

I have the following dataframe.

and the schema looks like this.

root
 |-- nro_ot: decimal(12,0) (nullable = true)
 |-- json_bcg: string (nullable = true)

The column "json_bcg" is just a string and I need to edit the schema to explore the contents.

function explode() dont work.

Are you looking for the [pyspark.sql.functions.json_tuple()](https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.json_tuple.html) function? — Domi, Sep 07 '22 at 20:03

score 0 · Answer 1 · answered Sep 07 '22 at 21:30

Pyspark: Parse a column of json strings

Helped me.

from pyspark.sql.functions import from_json, col
json_schema = spark.read.json(df.rdd.map(lambda row: row.json)).schema
df.withColumn('json', from_json(col('json'), json_schema))

In my case I edited a littlebit

import pyspark.sql.functions as f

df = spark.sql('Select nro_ot, json_bcg from sandbox_did_sio_phernandez.Batch_10')

json_schema = spark.read.json(df.rdd.map(lambda row: row.json_bcg)).schema
df = df.withColumn('json_bcg', f.from_json(f.col('json_bcg'), json_schema))

display(df)
df.printSchema()

Pyspark Edit Schema (json column)

1 Answers1