How to load JSON(path saved in csv) with Spark?

Question

I am new to Spark. I can load the .json file in Spark. What if there are thousands of .json files in a folder. picture of .json files in the folder

And I have a csv file, which classifies the .json files with labels.picture of csv file

What should I do with Spark if I want to load and save the data.(for example.I want to load the first information in csv, but it is text information. But it gives the path of .json, and I want to load the .json, then save the output. So I will know the first Trusted label graph's json information.)

score 1 · Accepted Answer · edited Jun 20 '16 at 19:56

1

For the JSON:

jsonRDD = sql_context.read.json("path/to/json_folder/");

For CSV install spark-csv from here Databricks' spark-csv

csvRDD = sql_context.read.load("path/to/csv_folder/",format='com.databricks.spark.csv',header='true',inferSchema='true')

edited Jun 20 '16 at 19:56

Alberto Bonsanto

17,556
10
64
93

answered Jun 20 '16 at 19:41

Neel Tiwari

81
1
5

Thanks. Another question. How can I make the thousands of .json work parallel? Map&Reduce? – Fengyu Jun 20 '16 at 21:15
1

Also, note that from 2.0.0 onwards parsing csv will be a part of Spark itself and you won't have to rely on spark-csv anymore. – BenFradet Jun 21 '16 at 07:51

How to load JSON(path saved in csv) with Spark?

1 Answers1