Currently I'm storing a bunch of records (nested dictionaries of stuff) as JSON record-per-line in a file (for a machine learning task). Reading them is a bottleneck, so I'm looking for a faster storage format. So far I looked at pickle and msgpack, but both produce newlines in the encoding process that make them a non-starter. Any suggestions?
Asked
Active
Viewed 133 times
0
-
2Any particular reason you need one record per line? Why not just pickle the record list itself, and not care _how_ it's stored, as long as the computer can read it? – Kevin Jun 13 '13 at 15:00
-
Try cPickle if you're looking for something a little faster. – Lanaru Jun 13 '13 at 15:12
-
Faster storage format might be a [SSD](https://en.wikipedia.org/wiki/Solid-state_drive). Side note: I once needed to load a large pickled dictionary that took 5 minutes to load; using [PyPy](http://pypy.org/) reduced the time to 30 seconds or less. – Wesley Baugh Jun 13 '13 at 15:27
-
@Kevin, need record per line so that I can easily process ranges in the file. – Maxim Khesin Jun 13 '13 at 18:05