-1

I try to generate csv file from json type data. These are my json test data.

{
  "realtime_start":"2020-09-25",
  "realtime_end":"2020-09-25",,
  "units": "Percent",
  "seriess": [
    {
      "name": "James",
      "age": 29,
      "house": "CA"
    },
    {
      "name": "Jina",
      "age": 39,
      "house": "MA",
      "notes": "Million tonne punch"
    },
}

The problem is json array type "seriess" does not contain "notes" node in all every nodes. I made the below java codes to change this json data to csv file with header columns

JSONObject json = getJsonFileFromURL(...)

JSONArray docsArray = json.getJSONArray("seriess");
docsArray.put(json.get("realtime_start"));
docsArray.put(json.get("realtime_end"));
docsArray.put(json.get("units"));

JsonNode jsonTree = new ObjectMapper().readTree(docsArray.toString());
        
Builder csvSchemaBuilder = CsvSchema.builder();
for(JsonNode node : jsonTree) {
    node.fieldNames().forEachRemaining(fieldName -> {csvSchemaBuilder.addColumn(fieldName);} );
}

CsvSchema csvSchema = csvSchemaBuilder.build().withHeader();
CsvMapper csvMapper = new CsvMapper();
csvMapper.writerFor(JsonNode.class).with(csvSchema).writeValue(new File("test.csv"), jsonTree);

But the incorrect results are shown like below,

realtime_start,realtime_end,units,names,age,house,realtime_start,realtime_end,units,names,age,house,notes, realtime_start,.....

The generated header columns does not contain distinct values. The header columns are added in duplicate. How can I generate the distinct header like below

realtime_start,realtime_end,units,names,age,house, notes

Any idea?

Update Part

I try to extract data from the FRED (FEDERAL RESERVE BANK OF ST. LOUIS). FRED provide simple and convenient Python api like below,

from fredapi import Fred 
import pandas as pd

fred = Fred(api_key='abcdefghijklmnopqrstuvwxyz0123456789')
data_unemploy = fred.search('Unemployment Rate in California')
data_unemploy.to_csv("test_unemploy.csv")

But the java apis are deprecated, so I have to develop simple Java api which convert json values to csv file. I found the below Java codes with googling

JSONObject json = getJsonFileFromURL("https://api.stlouisfed.org/fred/series/search?search_text=Unemployment+Rate+in+California&api_key=abcdefghijklmnopqrstuvwxyz0123456789&file_type=json");
        
JSONArray docsArray = json.getJSONArray("seriess");
docsArray.put(json.get("realtime_start"));
docsArray.put(json.get("realtime_end"));

JsonNode jsonTree = new ObjectMapper().readTree(docsArray.toString());
JsonNode firstObject = jsonTree.elements().next();  // I am struggling with this line 
firstObject.fieldNames().forEachRemaining(fieldName -> {csvSchemaBuilder.addColumn(fieldName);} );
CsvSchema csvSchema = csvSchemaBuilder.build().withHeader();
        
CsvMapper csvMapper = new CsvMapper();
csvMapper.writerFor(JsonNode.class).with(csvSchema).writeValue(new File("test.csv"), jsonTree);

To extract columns from json data JsonNode firstObject = jsonTree.elements().next(); return the first json node. But this line does not return notes column. because the first line does not contain the notes key value.

So I change this code line to following lines

for(JsonNode node : jsonTree) {
    node.fieldNames().forEachRemaining(fieldName -> {
        csvSchemaBuilder.addColumn(fieldName);
    } );
}

But these lines throws the results which I do not expect. The repeated duplicated columns like below

realtime_start,realtime_end,units,names,age,house,realtime_start,realtime_end,units,names,age,house,notes, realtime_start,.....

I am totally stuck with this part.

halfer
  • 19,824
  • 17
  • 99
  • 186
Joseph Hwang
  • 1,337
  • 3
  • 38
  • 67
  • Well what did you expect? Have you thought this through properly, you are flattening a complex structure to a single row so you need to have a strategy for creating a new row for each element in the array and including values from outside of the array in multiple rows. You also need to keep track of all possible columns and makes sure defaults values gets written in case a value is missing in the json. – Joakim Danielson Sep 26 '20 at 06:45

2 Answers2

1

You can do it with a library Apache Commons IO

pom.xml

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.6</version>
</dependency>

ConvertJsonToCSVTest.java

import java.io.File;
import org.apache.commons.io.FileUtils;
import org.json.*;
public class ConvertJsonToCSVTest {
   public static void main(String[] args) throws JSONException {
      String jsonArrayString = "{\"fileName\": [{\"first name\": \"Adam\",\"last name\": \"Smith\",\"location\": \"London\"}]}";
      JSONObject output;
      try {
         output = new JSONObject(jsonArrayString);
         JSONArray docs = output.getJSONArray("fileName");
         File file = new File("EmpDetails.csv");
         String csv = CDL.toString(docs);
         FileUtils.writeStringToFile(file, csv);
         System.out.println("Data has been Sucessfully Writeen to "+ file);
         System.out.println(csv);
      }
      catch(Exception e) {
         e.printStackTrace();
      }
   }
}

Output

Data has been Sucessfully Writeen to EmpDetails.csv
last name,first name,location
Smith,Adam,London
ArmDuke
  • 100
  • 4
1

Most probably it is easiest to write a bin type class like below :

public class CsvVo {

    private String realtime_start;
    private String realtime_end;
    private String units;
    private String name;
    private String age;
    private String house;
    private String notes;

    public void setRealtime_start(String realtime_start) {
        this.realtime_start = realtime_start;
    }

//Other getters and Setters

Then you can Write :

public class ConvertJsonToCSVTest {
    public static void main(String[] args) throws JSONException {
        String jsonArrayString = "{\n" +
                "\t\"realtime_start\": \"2020-09-25\",\n" +
                "\t\"realtime_end\": \"2020-09-25\",\n" +
                "\t\"units\": \"Percent\",\n" +
                "\t\"seriess\": [{\n" +
                "\t\t\t\"name\": \"James\",\n" +
                "\t\t\t\"age\": 29,\n" +
                "\t\t\t\"house\": \"CA\"\n" +
                "\t\t},\n" +
                "\t\t{\n" +
                "\t\t\t\"name\": \"Jina\",\n" +
                "\t\t\t\"age\": 39,\n" +
                "\t\t\t\"house\": \"MA\",\n" +
                "\t\t\t\"notes\": \"Million tonne punch\"\n" +
                "\t\t}\n" +
                "\t]\n" +
                "}";
        JSONObject inJson;
            List<CsvVo> list = new ArrayList<>();
            inJson = new JSONObject(jsonArrayString);
            JSONArray inJsonSeries = inJson.getJSONArray("seriess");
            for (int i = 0, size = inJsonSeries.length(); i < size; i++){
                CsvVo line = new CsvVo();
                line.setRealtime_start(inJson.get("realtime_start").toString());
                line.setRealtime_end(inJson.get("realtime_end").toString());
                line.setUnits(inJson.get("units").toString());
                JSONObject o = (JSONObject)inJsonSeries.get(i);
                line.setName(o.get("name").toString());
                line.setAge(o.get("age").toString());
                line.setHouse(o.get("house").toString());
                try {
                    line.setNotes(o.get("notes").toString());
                }catch (JSONException e){
                    line.setNotes("");
                }
                list.add(line);
            }
            String[] cols = {"realtime_start", "realtime_end", "units", "name", "age", "house", "notes"};
            CsvUtils.csvWriterUtil(CsvVo.class, list, "in/EmpDetails.csv", cols);

        }
    }

csvWriterUtil is like below :

    public static <T> void csvWriterUtil(Class<T> beanClass, List<T> data, String outputFile, String[] columnMapping){
        try{
            Writer writer = new BufferedWriter(new FileWriter(outputFile));
            ColumnPositionMappingStrategy<T> strategy = new ColumnPositionMappingStrategy<>();
            strategy.setType(beanClass);
            strategy.setColumnMapping(columnMapping);
            StatefulBeanToCsv<T> statefulBeanToCsv =new StatefulBeanToCsvBuilder<T>(writer)
                    .withMappingStrategy(strategy)
                    .build();
            writer.write(String.join(",",columnMapping)+"\n");
            statefulBeanToCsv.write(data);
            writer.close();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (CsvRequiredFieldEmptyException e) {
            e.printStackTrace();
        } catch (CsvDataTypeMismatchException e) {
            e.printStackTrace();
        }
    }

Full example is available in GitRepo

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
soumya-kole
  • 1,111
  • 7
  • 18
  • Thank you for your reply. I have more question. In above question my input json data type is simply String. But in my project, my input source is text file which contains hundreds of json data. So columns header of csv file are generated in duplicate on and on. How can I remove the duplicated columns value, so get the distinct header column value? – Joseph Hwang Sep 26 '20 at 23:22
  • You can write a function to read the file line by line (assuming one Json string per line) and return List from that function. Then just pass the List to csvWriterUtil. Let me know if it helps you. – soumya-kole Sep 27 '20 at 07:49
  • Dear, @soumya-kole , Please check my Updated Part and leave your advice. – Joseph Hwang Sep 28 '20 at 10:10
  • What do you want to use. If you want to use Jackson, then use it or org.json library can be used for this simple example. Your updated part is probably mixing both and complicating it. – soumya-kole Sep 28 '20 at 18:07