I need to read chunks of 64KB in loop, and process them, but stop at the end of file minus 16 bytes: the last 16 bytes are a tag
metadata.
The file might be super large, so I can't read it all in RAM.
All the solutions I find are a bit clumsy and/or unpythonic.
with open('myfile', 'rb') as f:
while True:
block = f.read(65536)
if not block:
break
process_block(block)
If 16 <= len(block) < 65536
, it's easy: it's the last block ever. So useful_data = block[:-16]
and tag = block[-16:]
If len(block) == 65536
, it could mean three things: that the full block is useful data. Or that this 64KB block is in fact the last block, so useful_data = block[:-16]
and tag = block[-16:]
. Or that this 64KB block is followed by another block of only a few bytes (let's say 3 bytes), so in this case: useful_data = block[:-13]
and tag = block[-13:] + last_block[:3]
.
How to deal with this problem in a nicer way than distinguishing all these cases?
Note:
the solution should work for a file opened with
open(...)
, but also for aio.BytesIO()
object, or for a distant SFTP opened file (withpysftp
).I was thinking about getting the file object size, with
f.seek(0,2) length = f.tell() f.seek(0)
Then after each
block = f.read(65536)
we can know if we are far from the end with
length - f.tell()
, but again the full solution does not look very elegant.