How to efficiently handle huge JSON file, needs some ideas

Question

It's a questions about the train of thought, so please don't let me to use a third-party library to deal this.

Recently, I took a job interview, There's a questions like below:

there is a huge JSON file, structure like a Database:

    {
    "tableName1 ":[
       {"t1field1":"value1"},
       {"t1field2":"value2"},
          ...
       {"t1fieldN":"valueN"}
        ], 
    "tableName2 ":[
       {"t2field1":"value1"},
       {"t2field2":"value2"},
          ....
       {"t2fieldN":"valueN"}
        ],
        .......
        .......
     "tableNameN ":[
       {"tNfield1":"value1"},
       {"tNfield2":"value2"},
          ....
       {"tNfieldN":"valueN"}
    ]
   }

And the requirements is:

find some special child-node by given child-node' name and update it's field's value then save it to a new JSON file.
1. count the number of given field's name and value.

when it's a normal size JSON file, I wrote a Utility class to load the JSON file from local and parse it to JSON Object. Then I wrote two methods to deal the two requirements:

void upDateAndSaveJson(JSONObject json, String nodeName,
            Map<String, Object> map, Map<String, Object> updateMap,
            String outPath) {
     //map saved target child-node's conditions
     //updateMap saved update conditions
     // first find the target child-node and update it finally save it
     // ...code ... 
}
int getCount(JSONObject json, Map<String, Object> map) {
  //map saved field target field/value 
  // ...code...
}

But the interviewer let me thinking about the situation when the JSON file is very huge, then modify my code and how to make it more effective.

My idea is write a tool to split the JSON file first. Because finally I need take the JSON Object to invoke previous two methods, so before I split the huge JSON file I know the parameters of the two methods: a Map(saved target child-node's conditions/or field target field/value), nodeName(child-node name)

so when I load the JSON file I compare the inputstream String with the taget nodeName, and then start to count the number of object the child-node, if rule is 100， then when it have 100 objects, I split the child-node to a new smaller JSON file and remove it in source JSON file.

Like below:

   while((line = reader.readLine()) != null){           
        for (String nodeName : nodeNames) {
            //check if its' the target node
            if (line.indexOf(nodeName) != -1) {
                //count the target child-node's object 
                //and then split to smaller JSON file
            }   
        }
    }

After that I can use multiple thread to load the smaller JSON file previous created and invoke the two method to process the JSON Object.

It's a questions about the train of thought, so please don't tell me you can use a third-party library to deal this problem.

So if my though feasible? or is there some other idea you guys have, please share it.

Thanks.

The problem is not related to the fact that the JSON data is huge. Rather it is the question of searching the keyword and fetch that JSON object and change/update it's values. — Keshav, Aug 21 '15 at 20:06
assuming the structure of JSON is mostly like what you have described, can you not stream and parse just one tableName1 at a time, check for the condition( fieldname, field value), dump it into the outfile (modifying field value if required), and then move on to process next (tableName) element in the stream? — Amm Sokun, Aug 22 '15 at 12:21

How to efficiently handle huge JSON file, needs some ideas

0 Answers0