If I have multiple threads generating blocks of a file, what is the best way to write out the blocks?
ex) 5 threads working on a file of 500 blocks, block 0 is not necessarily completed before block 1, but the output file on disk need to be in order. (block 0, block 1, block 2, .... block 499)
the program is in C++, can fwrite() somehow "random access" the file? the file is created from scratch, meaning when block 5 is completed, the file may still be of size 0 due to block 1~4 are not completed yet. Can I directly write out block 5? (with proper fseek)
This piece of code is performance critical, so I'm really curious about anything that can improve the perf. This looks like a multiple producer(block generators) and one consumer(output writer) scenario. The idea case is that thread A can continue generating the next block when it complete the previous.
if fwrite can be "random", then the output writer can simply takes outputs, seek, and then write. However not sure if this design can perform well in large scale.
Some limitations
- Each block is of the same size, generated in memory
- block size is known in advance, but not the total number of blocks.
- the total size is a few GBs. Big.
- There could be multiple jobs running on one server. each job is described at above. They have their own independent generators/writer, difference processes.
- The server is a Linux/CentOS machine.