In CPython, I'm able to read a 1.6 GB file in 11 seconds, using lines = f.read().splitlines()
, while in PyPy3, the exact same code takes 2 minutes to read the file. Using f.read().split('\n')
manages to do it in 1 minute, but is still much slower than CPython. The reason I want to use PyPy is that the loop I do later over lines
is way faster in PyPy, but I'm still curious why PyPy is slower than CPython for file I/O, and how I might possibly speed it up.
Asked
Active
Viewed 350 times
0

Peter Ruse
- 13
- 3
-
I'm investigating why pypy3 seems much slower than pypy2 and cpython there. I guess it's not exactly what you're looking for but the next pypy3 release we'll do will see better results. – Armin Rigo Jul 13 '20 at 13:10
-
If you can, you may find that streaming the file works better than reading it all in one go: with open('path/to/file') as f: for line in f: processline(line) – mattip Jul 14 '20 at 06:18
-
@mattip thanks for your reply. yes, i have noticed that, and that's what i ended up using. but it's still a curiosity that f.read() is so slow in pypy... – Peter Ruse Jul 15 '20 at 01:22
-
@ArminRigo thanks for your reply. looking forward to the next pypy3 release. please do let us know the reason for the difference in speed, if you figure it out :) – Peter Ruse Jul 15 '20 at 01:23
-
The main issue was internal: it turns out that pypy3 does 5 copies of the result of `read()` before returning it, whereas pypy2 manages with only 1. For large strings this is extremely costly. This was fixed in the changeset number 119c84856339, in case you want to dig. – Armin Rigo Jul 15 '20 at 12:00
-
Thank you @ArminRigo, for figuring out what the issue was. Where can I access the update? – Peter Ruse Jul 19 '20 at 19:42
-
https://buildbot.pypy.org/nightly/py3.6/. It probably helps a lot, but it may still be slower than CPython. Please report the updated measures you do! – Armin Rigo Jul 21 '20 at 11:27
-
Awesome. Once i've updated and run again, I'll revert back with results. Thanks @ArminRigo – Peter Ruse Jul 21 '20 at 20:32
-
@ArminRigo I tested the latest build for MacOS (from 2020-07-03) on the same 1.6GB file, and the performance unfortunately is the same as before. – Peter Ruse Jul 26 '20 at 21:55
-
Unsure how you expect a version older than your question to contain the fix I did in reply to your version. But indeed, I see there haven't been more recent builds on OS X. I'll ask around. – Armin Rigo Jul 28 '20 at 07:50
-
Oops, totally missed that. I was confusing July with June for some odd reason :) Thank you again for investigating. I’ll check in regularly for a new macOS build – Peter Ruse Jul 30 '20 at 03:33
-
I am having the same issues. Were you able to speed up reading/iterating a file using pypy? – user1179317 Aug 19 '20 at 23:31