2

I am having a file sized 15-16 GB containing \n separated JSON data (Approx 70-80 mil Data).

What will be the easiest way to read such large file in python without consuming much memory and also fast.

Also while reading such large file,If the script fails in between how to resume the reading from last read line using python?

Lijo Abraham
  • 841
  • 8
  • 30
  • 1
    "What will be the best programming language to read such huge file?" I'm assuming both languages have libraries for parsing JSON streams, so use whichever language you're most comfortable with. If that's Perl, see [How can I stream JSON from a file?](http://stackoverflow.com/q/12460058/176646) – ThisSuitIsBlackNot Apr 25 '16 at 17:24
  • "Fast" is relative. Start with `for line in sys.stdin: process_data(json.loads(line))` and wrap with try/except, to skip lines that cause errors. – jfs Apr 25 '16 at 17:27
  • ijson (https://pypi.python.org/pypi/ijson/) is the way to handle streaming json in python. – nephlm Apr 25 '16 at 17:41
  • @nephlm : my file is not proper json file. File contains JSON objects seperated by new line (\n) .So will ijson work? – Lijo Abraham Apr 26 '16 at 06:05
  • @LijoAbraham: If each json object is on it's own line (e.g. no new lines in the json objects than J.F. Sebastian's method will work. I don't know if ijson will work on something that isn't a json stream. You'd have to test it. – nephlm Apr 26 '16 at 07:33

0 Answers0