2

I just tried to read in a big json file (the Wikipedia json dump) in Python line by line and got the Error:

Traceback (most recent call last):
  File "C:/.../test_json_wiki_file.py", line 19, in <module>
    test_fct()
  File "C:/.../test_json_wiki_file.py", line 12, in test_fct
    for line in f:
OSError: [Errno 9] Bad file descriptor

Here is my code:

import json

def test_fct():
    data = []
    i = 0
    with open('E:/.../20200713.json/20200713.json') as f:
        for line in f:
            data.append(json.loads(line))
            i = i + 1

        if i > 1:
            input_file.close()
            return data

test_data = test_fct()

The file size is around 700GB and the description (https://www.wikidata.org/wiki/Wikidata:Database_download) of the file states that it can be read line by line. I don't know if this is important but the E:/ hard drive is an external one.

Thank you for your help in advance :)

LaLeLo
  • 137
  • 1
  • 9
  • You are trying to load 700GB into memory - no wonder it fails :) `data` will grow to size of file, you are appending every row of json to it – Rob Raymond Jul 20 '20 at 13:48

1 Answers1

1

I don't have any firsthand knowledge on opening large files in python, but did you mean to have the path as 20200713.json/20200713.json. Is the first one actually a directory that has a .json extension? I'd also suggest trying to first load a smaller sample of the file (opening might be hard, so maybe just use the more command in terminal?).

  • Thank you the problem was actually the path, even if it was the correct path, it seems to be a problem if the folder is called [...].json – LaLeLo Jul 20 '20 at 13:46
  • Sweet. I'm curious if you'll have to do anything special to load the file in a bit at a time once you get it. Also, mind marking this as the correct answer? – grahamjpark Jul 20 '20 at 20:08