0

I am using JsonPath to retrive an array from a ~30 MB json file. The reason for that is when loading the file into a bufferedReader, the app crashes due to lack of memory.

The problem is it takes so long (~2 min) to get one array using one call. When enabling more memory to the app through setting largeHeap = true, the whole process (buffering then straightforward json-parsing) takes less than 10 seconds.

My method is simple (kotlin):

val indexes: List<String> = JsonPath.read(stream, "$.root.word")

Here, stream is a InputStream (GZIPInputStream). No matter the size of the retrived array (some have more than 1000 elements, where other have less than 20) it still takes around the same time.

I know it is not supposed to be that slow. What am I missing?

And if that is always the case, is there any suggestion on how to parse a large json file without loading it into memory?

Ali_Habeeb
  • 205
  • 2
  • 9
  • 1
    Of course it takes the same time, because it still has to **read** all ~30 MB from the stream, regardless of how much of it you need. The time is what it takes to parse the 30 MB of JSON text, not what it takes to apply the JSON path. – Andreas Nov 05 '20 at 19:28
  • *FYI:* [JsonPath](https://github.com/json-path/JsonPath#jayway-jsonpath) is not a streaming processor. It always parses the entire JSON text into memory, similar to how DOM is a full load of an XML document. You can't use JsonPath if you only want to load the list of words, in order to keep memory use low. For that you'd need a streaming parser. – Andreas Nov 05 '20 at 19:33
  • So what advantage does JsonPath have on the normal json parsing? Isn't path query recommended for its speed? or just for memory-related problems? – Ali_Habeeb Nov 05 '20 at 19:36
  • 2
    No, not at all. Path query is recommended for its **ease of use**. As with most cases, ease of use comes at a cost, which in this case, with this library, is that it requires fully parsing the JSON text into memory, even if you only need a small part of it, meaning that it has a high memory footprint. – Andreas Nov 05 '20 at 19:39
  • *"any suggestion on how to parse a large json file without loading it into memory"* --- I already told you: Use a **streaming** JSON parser. --- (Hint: That should trigger you to go search the web to learn more about those. Do not ask here until after you've done that research yourself) – Andreas Nov 05 '20 at 19:42
  • If I got you right, I've already tried Klaxon and Gson, but faced some complexities with nested object/arrays values, and thought JsonPath would be the solution. Thanks for the comments and the hint. – Ali_Habeeb Nov 05 '20 at 19:55
  • If you need help doing this with a streaming parser, create a new question, show what you've tried, and explain how that's failing to do what you want. Make sure to show example of the JSON you're attempting to parse. – Andreas Nov 05 '20 at 20:08

0 Answers0