1

I'm writing a de-serializer which reads a huge json file and puts records matching a filter (logic in my application) into database. The json file has a fixed schema as follows:

{
    "cityDetails": {
        "name": "String",
        "pinCodes": "Array of integers ",
        "people": [{
            "name": "String",
            "age": "Integer"
        }]
    }
}

I am only interested in streaming list of "people" from the file. I am aware that GSON/Jackson provide streaming APIs which I can use but I want to avoid looping through the tokens as I stream them and match their name to see if I am interested in them. I believe that there should be a solution which can do the streaming in background and point/seek the stream to the token I am interested in. I don't see any reason why this should not be possible if I provide my JSON schema. Is there are solution available for this?

Here's a sample instance of my JSON:

{
    "cityDetails": {
        "name": "mumbai",
        "pinCodes": ["400001", "400002"],
        "people": [{
            "name": "Foo",
            "age": 1
        }, {
            "name": "Bar",
            "age": 2
        }]
    }
}

2 Answers2

0

With GSON I would just create corresponding DTOs for the data to be parsed.

So you have some wrapper that is the root object:

@Getter
public class Wrapper {
    private CityDetails cityDetails; 
}

and city details:

@Getter
public class CityDetails {
    private List<Person> people;
}

and possibly many Persons in the list people:

@Getter
@ToString
public class Person {
    private String name;
    private Integer age;
}

Then you can simply use for example Reader like below:

@Test
public void test() {
    Gson gson = new Gson();
    // assuming your json is named "test.json" in the same directory as test
    Reader r = new InputStreamReader(getClass().getResourceAsStream("test.json"));
    Wrapper wrapper = gson.fromJson(r, Wrapper.class);        
    wrapper.getCityDetails().getPeople().forEach(p -> log.info("{}", p.toString()));
}

Gson will search and instantiate only what is specified in DTO-classes the rest is ignored when parsing.

pirho
  • 11,565
  • 12
  • 43
  • 70
0

A nice way of doing this would be to use JsonPath.

A json path of:

$.cityDetails.people

will return just the contents of the people array:

[
  [
    {
      "name": "Foo",
      "age": 1
    },
    {
      "name": "Bar",
      "age": 2
    }
  ]
]

Here is a Java implementation...

tom redfern
  • 30,562
  • 14
  • 91
  • 126