I've written a python script that uses opencv to read images off a drive, do a little bit of processing, and store them in a buffer. To speed it up, I created a multithreaded version using python's "threading" package that spawns a bunch of workers that read the files in parallel.
(Keeping everything in memory would be the ideal solution, but there's just too many files and not enough memory)
The implementation is super simple, each worker is given a list of filenames to read, using cv2.imread(), process, and store in a buffer until it is asked for them.
I've tested this script multiple times on two machines, and observe the following:
Windows PC, SATA SSD, Single-Threaded: This is the baseline, it works fine.
Windows PC, SATA SSD, Multi-Threaded: Significant speedup that scales very well with worker count, and works fine.
Ubuntu PC, NVME SSD, Single-Threaded: Again, this is just reading files in a loop - also works fine.
Ubuntu PC, NVME SSD, Multi-Threaded: Does not seem to be any faster than the ST version, and is considerably slower than the MT version on the SATA SSD. It does do its job up until the image files on the drive start becoming corrupt. A handful become corrupt, and the script crashes out b/c it can't open them.
Reading the affected files programmatically produces the error: "libpng error: bad adaptive filter value" They can't be opened with a photo viewer or anything like that. At a cursory inspection they seem to have been truncated.
I initially wrote the script on my Windows PC where it worked fine. I've replaced the images and run a few trials to verify that the MT script is the cause of this file corruption that I'm seeing. It does seem to be the case.
My best guess at the issue is that two threads trying to read the same image at the same time is the culprit. I'm not sure of this however, as it is a read operation, so naively I wouldn't expect it to change the data at all, and it doesn't seem to cause any issues on the windows machine using the SATA drive.
I was also expecting a similar speed boost on the Ubuntu machine, it's curious that this isn't the case.
The drive in question is a Samsung 970 Evo Plus.