2

I am trying to parse huge JSON file (around 20GB). Trying to read one line at a time (each line is a JSON object) and extract the required details.

Example:

JSON file data looks like the one shown below

{
    {a: [], b: [], c: [], d: [],e: []},
    {a: [], b: [], c: [], d: [],e: []},
    .....,
    {a: [], b: [], c: [], d: [],e: []}, 
}

Snippet to parse:

count = 0;
with open(fileName) as fp:
    try:
        for line in fp:
        data_local = json.loads(line)
        count = count + 1
        #access the data_local["a"]
    except:
        print "Error found" , count , len(data_local["a"])

Error Message (when "except block" not used):

Traceback (most recent call last):
File "./xyzFile", line 606, in <module>
for line in fp:
SystemError: Negative size passed to PyString_FromStringAndSize

Output (when "except" block" is used)

Error found  65 5392287

Found something similar on stack overflow but that didn't help. Tried to debug by catching the exception. It throws the error after reading 65th JSON objects (lines). Each JSON object is huge(in size and no of values)

Any lead on this would be appreciated.

Thanks

Community
  • 1
  • 1
user3214392
  • 245
  • 4
  • 15

1 Answers1

0

Remove the try/except and try again, will probably show the correct error, I think this error happens when the return is too big for the catch.

After your for is not indented either.

try like this:

count = 0
with open(fileName) as fp:
    for line in fp:
        data_local = json.loads(line)
        count = count + 1
        #access the data_local["a"]
Andrew Ryan
  • 1,489
  • 3
  • 15
  • 21