I am new to Apache Spark 1.3.1. How can I convert a JSON file to Parquet?
Asked
Active
Viewed 3.2k times
10

halfer
- 19,824
- 17
- 99
- 186

odbhut.shei.chhele
- 5,834
- 16
- 69
- 109
-
You could also use Apache Drill (maybe easier to setup), you could convert JSON from a local-filesystem to HDFS Parquet in 1 line of SQL: "CREATE TABLE dfs.drill.`/test5/` AS (SELECT * FROM dfs.gen.`/2016/10/*/*.json` e);", if you are interested => https://drill.apache.org/docs/parquet-format/. – Thomas Decaux Oct 05 '16 at 07:14
1 Answers
19
Spark 1.4 and later
You can use sparkSQL to read first the JSON file into an DataFrame, then writing the DataFrame as parquet file.
val df = sqlContext.read.json("path/to/json/file")
df.write.parquet("path/to/parquet/file")
or
df.save("path/to/parquet/file", "parquet")
Check here and here for examples and more details.
Spark 1.3.1
val df = sqlContext.jsonFile("path/to/json/file")
df.saveAsParquetFile("path/to/parquet/file")
Issue related to Windows and Spark 1.3.1
Saving a DataFrame as a parquet file on Windows will throw a java.lang.NullPointerException
, as described here.
In that case, please consider to upgrade to a more recent Spark version.

Rami
- 8,044
- 18
- 66
- 108
-
-
getting a NullPointerException when I try to saveAsParquetFile – odbhut.shei.chhele Jan 12 '16 at 11:12
-
-
-
-
I have just exactly tried these two lines of code on spark-1.3.1-bin-hadoop2.6 and it worked. Please check your code. and make sure you are not writing in a non-existing directory and you are correctly reading the file into the DataFrame. – Rami Jan 12 '16 at 11:26
-
-
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/100463/discussion-between-rami-and-eddard-stark). – Rami Jan 12 '16 at 11:34