Python hashlib and sparse files

Question

I wanted to know how does python hashlib library treat sparse files. If the file has a lot of zero blocks then instead of wasting CPU and memory on reading zero blocks does it do any optimization like scanning the inode block map and reading only allocated blocks to compute the hash?

If it does not do it already, what would be the best way to do it myself.

PS: Not sure it would be appropriate to post this question in StackOverflow Meta.

Thanks.

If you copied your spare file onto a different filesystem that didn't support sparse files (or that had a different block size, so that a different set of blocks were omitted), would you seriously want it to have a different hash? — jasonharper, Nov 04 '16 at 20:26

score 1 · Answer 1 · answered Nov 04 '16 at 20:22

The hashlib module doesn't even work with files. You have to read the data in and pass blocks to the hashing object, so I have no idea why you think it would handle sparse files at all.

The I/O layer doesn't do anything special for sparse files, but that's the OS's job; if it knows the file is sparse, the "read" operation doesn't need to do I/O, it just fills in your buffer with zeroes with no I/O.

Python hashlib and sparse files

1 Answers1