0

I'm using ijson.kvitems to iterate over all of the keys in a JSON file that I have.

the JSON file looks like this:

{"filename":{"file_data":
{"name":"samplefile",
"filetype":"Miscellaneous",
"id":123,
"timestamp":"2020-10-08 00:20:00"}}}

based on this answer, a simplified version of my code looks something like so (v is a dictionary too):

import ijson

f = open('file.json')
for k, v in ijson.kvitems(f, ''):
    name = v['name']
    user_id = v['id']
    filetype = v['filetype']
    timestamp = v['timestamp']

I am only able to stream/read about 94% of the keys from the original file this way, trying to figure out if there is a way to get to the remaining 6%.

Thanks!!

Haminator
  • 11
  • 1
  • Please, post [mre], incl. imports and sample input JSON. Asking for recommendations for off-site resource is off-topic on SO. – buran Jun 03 '21 at 05:37
  • I'd like to help, but from the question it's not clear what the exact problem is. In your example, which keys cannot be read? The example itself also seems wrong, because an empty path in `kvitems` should yield `filename` for `k` and the object under that key for `v` – Rodrigo Tobar Jun 04 '21 at 16:30
  • AFAIK, the empty path should allow access to all of the keys (filenames) in the dictionary, and v should contain the nested dictionary ({"file_data": {"name":...}}), and this is how I use kvitems. my problem is that when iterating over k, v in ijson.kvitems(f, '') not all of the Ks (i.e. the filenames) are included in the iterator. – Haminator Jun 07 '21 at 07:31
  • Sorry, but the exact problem is still not clear (at least to me). Can you modify the example JSON document and test code to show exactly which keys are missing? Note also that the way you are indexing into `v` in the example code wouldn't work with the given JSON document, since access should look like `v['file_data']['name']`, etc – Rodrigo Tobar Jun 07 '21 at 11:38
  • I am unable to determine which keys are missing since there are a couple hundred-thousands of them, and the iteration takes forever (data is in a remote repo, long story). regarding the access - the way I understand it, being the value for "filename" key - v *is* 'file_data' - you cannot access v['file_data']. What am I missing? thanks again... – Haminator Jun 08 '21 at 07:55

1 Answers1

0

The documentation for kvitems maybe isn't fully clear: it returns key/value pairs at the given prefix, and it's not recursive. With your example document and code this is what kvitems returns (note that as of writing ijson.dump isn't yet on the latest PyPI ijson release, but is available on the latest master version on GitHub):

echo '{
  "filename": {
    "file_data": {
      "name":"samplefile",
      "filetype":"Miscellaneous",
      "id":123,
      "timestamp":"2020-10-08 00:20:00"
    }
  }
}' | python -m ijson.dump -m kvitems
#: key, value
-------------
0: filename, {'file_data': {'name': 'samplefile', 'filetype': 'Miscellaneous', 'id': 123, 'timestamp': '2020-10-08 00:20:00'}}

Here key is filename, while value is the rest of the object, since that whole object is the value under filename. In particular keys like name or filetype will not be reported separately; if you wanted those (and their respective values) to be reported you'd have to use a filename.file_data prefix instead.

From the comments in the original question I'm guessing this is the actual problem, but couldn't add this more extensive comment here to further clarify things, and with the hopes it's also the actual answer to your problem.

Rodrigo Tobar
  • 569
  • 4
  • 13