Writing in json format as an array of json object in Scala Spark

Question

Currently I am using this to write output in single partition.

 df.coalesce(1).write
.format("json")
.mode("overwrite")
.option("path",writePath)
.save

Ouput file is currently in this format :

{obj1} {obj2}

I want this as an array of json object. [{obj1}, {obj2}]

score 0 · Answer 1 · answered Feb 18 '22 at 09:43

0

Spark infers and writes json files where each line is a separate, self-contained valid JSON object. https://spark.apache.org/docs/latest/sql-data-sources-json.html

However for your desired output,

df.toJSON.collect.mkString("[", "," , "]" )

Note that collect on large dataframes is not recommended. More info

answered Feb 18 '22 at 09:43

Aditya

thanks, but not able to use df.write method after using above function – user17773575 Mar 22 '22 at 14:29
Obviously, df.write wont work. I made it a string. As per Spark, each row is an independent JSON object. If you want to use df write then just collect the df -> make String -> Convert to a df of single row single column -> use df.write – Aditya Apr 01 '22 at 14:49

1 Answers1