0

I have to read a large json file of size 3 Gb using python.There is a garbage value '][' between the data in the json files.For files with small volume,I used the below script to trim the garbage values.

filename=r'C:\Users\user1\Downloads\samplefile.json'
    with open(filename, encoding="utf8") as json_file:
        data = json_file.read()
data=data.replace('][',',')

For large sized files, I used the below script to read the files and got the following error which was handled using the above script when handling smaller files.

Script:

import ijson
f=ijson.items(open(r'C:\Users\user1\Downloads\samplefile.json','r'),'item')

Error:

IncompleteJSONError: parse error: trailing garbage 82220.00,"NUMBER":1799106.00}][{"DATE":"2021092412504700000 (right here) ------^

I have also used the read_json from pandas to read this but ended up with the same error. Any ideas on how to trim this garbage value would be really helpful.I have not shared the file or some samples as the files are used in a secure system.

I have tried using the file wrapper class as well mentioned below but still ending up the Memory error again

import ijson

class Foo(object):
    def __init__(self, fpath, mode , encoding):
        self.f = fpath
        self.mode = mode
        self.encoding = encoding
    def __enter__(self):
        print ('context begun')
        self.file = open(self.f, self.mode,encoding=self.encoding)
        self.file=self.file.read().replace('][',',')
        return self.file
    def __exit__(self, exc_type, exc_val, exc_tb):
        print ('closed')

        

with Foo(r'C:\Users\user1\Downloads\samplefile.json','r',encoding='utf-8') as json_file:
    objects = ijson.items(json_file, 'items')
Sandeep G
  • 13
  • 1
  • 4
  • All these methods require that the JSON be valid. If you need to fix the JSON first, they won't work. – Barmar Sep 30 '21 at 15:54
  • You'll need to define a file wrapper class that performs the `][` replacement on the fly, then use that with `ijson`. – Barmar Sep 30 '21 at 15:56
  • 1
    3 gigs of JSON? Are you sure it's not all garbage? – Jussi Nurminen Sep 30 '21 at 15:59
  • have you tried to replace it with a comma in the middle, like `'],['`? additionally, without more context of how the `][` appears, it's difficult to offer suggestions on how to fix it. – rv.kvetch Sep 30 '21 at 16:11

0 Answers0