0

I have to process a JSON file that is very large (86 GB). I have tried a few different methods of parsing the file, but none of them completed without running out of memory or crashing my computer, and they also didn't seem to have the outcome I need anyway.

The input I have is a list of product keys, and the output I need is only the records in the JSON file that pertain to those product keys. Is it possible to read this JSON file and filter it for only the relevant records?

Here is the schema for the file:

{
  "groups": 
 [
   {
  "groupID",
    "model",
    "groupname",
    "productCodes",
    "descriptors"
  "externalIDs"
  },
   {groupID,...},
   ...
 ]
}

"productCodes" is an array that contains multiple product keys, like so:

"productCodes": [{
    "type": "productkey",
    "value": "DEBL6"
  }, {
    "type": "productkey",
    "value": "GBAY4"
  }, {
    "type": "productkey",
    "value": "GBAYE"
  }, {
    "type": "productkey",
    "value": "GBQRF"
  }, {
    "type": "productkey",
    "value": "GBZTD"
  }, {
    "type": "productkey",
    "value": "ZA42A"
  }
],
rr_goyal
  • 467
  • 2
  • 8
Tyler Moore
  • 133
  • 1
  • 9
  • How much memory is available? How much memory are the filtered data expected to require? – Michael Ruth Jul 26 '23 at 23:40
  • Slurping the file isn't really practical here due to the JSON structure, but check out this [article](https://datastation.multiprocess.io/blog/2022-01-06-analyzing-large-json-files-via-partial-json-parsing.html) if you're interested in an approach. Try loading the data into a database (MongoDB seems obvious) and query for the result. – Michael Ruth Jul 26 '23 at 23:44
  • Please provide an example of how you're reading this file. – Michael Ruth Jul 26 '23 at 23:52

1 Answers1

1

You can try using jq in the command line and query for a given productkey and redirect the output to another file:

jq '.groups[] | select(.productCodes[].value == "GBAY4")' your_file.json > output.json
mlim1972
  • 203
  • 1
  • 3
  • 8
  • I tried to install jq and got this error (I use windows): ERROR: Could not build wheels for jq, which is required to install pyproject.toml-based projects Is it possible to use jq in windows? – Tyler Moore Jul 27 '23 at 23:12
  • There are a few ways to download the version for windows: https://bobbyhadz.com/blog/install-and-use-jq-on-windows . The project site has a .exe version: https://jqlang.github.io/jq/ – mlim1972 Jul 28 '23 at 03:48