I have dataframe
with column containing json string, which is converted to dictionary using from_json
function. Problem occured when json contains not typical string inside like: '\\"cde\\"'
, all json: '{"key":"abc","value":"\\"cde\\""}'
.
When from_json function is applied, it returns null
because I think it treats \\
as one char and it can not parse value
due to many "
inside.
Here is simple code snippet:
df = spark.createDataFrame(
[
(1, '{"key":"abc","value":"\\\\"cde\\\\""}')
],
["id", "text"]
)
df = df.withColumn('dictext', from_json(col('text'), json_schema))
display(df)
Is there way for cleaning such json or maybe encoding it somehow before callingfrom_json
or using another function, which is able to parse such string?