Multiple files reads and elaboration with OVERLAPPED operations (and IOCP?)

Question

I'm trying to figure out a way to improve a C++ Win32 program I've made which basically recursively traverses a given folder and for every given file computes an hash ( let's say MD5, but it could be any sort of CPU expensive computation ). Since this is an I/O bound application, most of the time the process is waiting for I/O to finish, hence not using as much CPU as it could. Even doing this using a thread pool would probably (am I wrong?) not solve the issue, every thread would block waiting for I/O to complete, plus there would be the context switching overhead.

So I'm starting to consider doing this using overlapped reads, every time I'd collect a new file to process I would enqueue a non blocking read operation to a queue, having one thread processing completion callbacks and block-hashing every chunk I receive from the queue itself ... theoretically this should avoid the process hanging on I/O wait and I should notice a CPU usage increase, thus an overall speedup.

I have the following questions:

I am assuming this will increase the overall performances of the application, am I right ? If not, why ?
Are I/O completions events guardanteed to be ordered the same way as reads operations? I mean, if I read N bytes from offsets A, B and C of a file, will I get the completion events of A, B and C in that order or could they arrive on a unpredictable order ?
I'm searching for a library or some code samples to implement this whole mechanism, should I use an IOCP, or simply RegisterWaitForSingleObject with a custom callbacks ? I do not seem to find examples for multiple files I/O, everything I find is just an example of overlapped reads on a single file, or an IOCP with sockets, can you point me to the right direction ?
Wouldn't a thread pool be useless in this case? A single thread approach should be good enough ( following nginx/libevent approaches for instance ), right ?

Please do not answer something with alternative solutions, I just want to implement an OVERLAPPED operations queue the best way I can, I'm not interested in anything else (unless proven to be more efficient for my scenario of course).

EDIT:

What the current implementation of the software is ( of course the app is not exactly like this, just to give an idea ):

DWORD crc32( PBYTE data, DWORD size )
{
    // compute the crc32 of the data and return it
}

void on_file_callback( const char *pszFileName )
{
    PBYTE file_map = ...; // Open the file and memory map it.

    if( crc32( file_map, file_size ) == 0xDEADBEEF )
    {
        printf( "OMG!!!\n" );
    }
    // Cleanup
}

int main( int argc, char **argv )
{
    const char *pszFolder = "c:\\";

    // recurse pszFolder and call 'on_file_callback' on every file found
    recurse_directory( pszFolder, on_file_callback );
}

Thanks.

Reading from multiple files simultaneously may make things worse because you incur head thrashing. If the operation truly is I/O-bound, then the limiting factor is how fast you can read data off the hard drive, and that typically is fastest when you read sequentially. — Raymond Chen, Apr 03 '14 at 22:11
Sure but I'm not interested in single file processing performances, but overall performances ( time_ended - time_started ) of the application, while I'm waiting for an I/O to complete, I could start to process what I already have, or schedule the next read. — Simone Margaritelli, Apr 03 '14 at 22:14
Perhaps use three threads: 1 for gui/interaction, 1 for directory traversal / file opening, one for chunk processing? Might be faster if there is much latency for directory operations... — Deduplicator, Apr 03 '14 at 22:16
no gui, the app completely automated and has no ui at all :) I'm afraid the latency is in file reading operations, not in directory recursion. — Simone Margaritelli, Apr 03 '14 at 22:18
So, you don't use it e.g. over the network, and the server is on the other side of the globe? Ok, one thread less (though it might make an infinitisimal difference). No gui -> Reduced by one threads, just use well buffered console output or no output at all. Hm, single-threaded left then — Deduplicator, Apr 03 '14 at 22:20
no networking, no output most of the cases ... just a SetEvent if a file meets a particular condition, which of course is not the bottleneck of the application :) — Simone Margaritelli, Apr 03 '14 at 22:24
My point is that if you issue a read from file B while waiting for the first read from file A to complete, then that will make the second read from A even slower because the drive head needs to seek all the way from B back to A. In other words, reading in the order A1 B1 A2 B2 A3 B3 is much slower than A1 A2 A3 B1 B2 B3. If you're going to issue multiple reads, then go A1 A2 A3 ... An. I.e. prefetch the next piece of the file you will need, rather than prefetching an unrelated file. — Raymond Chen, Apr 03 '14 at 22:27
Oh I see :) Btw to make it more clear, I've edited the question with some sample code. — Simone Margaritelli, Apr 03 '14 at 22:31
Raymond is right. I don't think that's the actual point of the question. Using overlapped/completionport code makes sense only if your program has something else to do that's worthwhile. The `and has no ui at all` comment is essential, pretty likely that you actually *don't* have anything to do worthwhile. You are just waiting for those async I/O operations to complete. This is not useful at all, it actually make your program slower. Synchronous I/O is always faster. — Hans Passant, Apr 03 '14 at 22:42
uhm so if I have to process 10000 files, using my synchronous I/O approach is faster than anything else I might use ( on the overall processing time, not for the single file ) ? — Simone Margaritelli, Apr 03 '14 at 22:45
Get 10000 disks, it will be faster. You can create 10000 handles on the file and start an overlapped I/O operation on each of them. They will not complete in order, only way to get ahead. — Hans Passant, Apr 03 '14 at 22:50
@HansPassant: I don't think synchronous I/O is inherently faster, is it? I'm fairly sure synchronous I/O calls are implemented as an asynchronous call followed by a wait, so it shouldn't really make any difference. — Harry Johnston, Apr 04 '14 at 01:20
It doesn't make sense, as already discussed, to attempt to read more than one file simultaneously. To maximize performance, however, you do want to be able to hash the data from one read operation while the next read operation is pending. Doing this in a single thread with asynchronous I/O would be slightly more efficient than doing the hashing in a separate thread, but not significantly. In this sort of scenario the simple form of asynchronous I/O (using an event) isn't particularly complicated, so if you do prefer that over multithreading go ahead, but don't expect huge performance gains. — Harry Johnston, Apr 04 '14 at 01:29

Multiple files reads and elaboration with OVERLAPPED operations (and IOCP?)

0 Answers0