0

Currently I'm storing a bunch of records (nested dictionaries of stuff) as JSON record-per-line in a file (for a machine learning task). Reading them is a bottleneck, so I'm looking for a faster storage format. So far I looked at pickle and msgpack, but both produce newlines in the encoding process that make them a non-starter. Any suggestions?

Ankur Ankan
  • 2,953
  • 2
  • 23
  • 38
Maxim Khesin
  • 597
  • 4
  • 10
  • 2
    Any particular reason you need one record per line? Why not just pickle the record list itself, and not care _how_ it's stored, as long as the computer can read it? – Kevin Jun 13 '13 at 15:00
  • Try cPickle if you're looking for something a little faster. – Lanaru Jun 13 '13 at 15:12
  • Faster storage format might be a [SSD](https://en.wikipedia.org/wiki/Solid-state_drive). Side note: I once needed to load a large pickled dictionary that took 5 minutes to load; using [PyPy](http://pypy.org/) reduced the time to 30 seconds or less. – Wesley Baugh Jun 13 '13 at 15:27
  • @Kevin, need record per line so that I can easily process ranges in the file. – Maxim Khesin Jun 13 '13 at 18:05

0 Answers0