The optimal performance will depend on the hard drive, drive fragmentation, the filesystem, the OS and the processor. But optimal performance will never be achieved by writing small chunks of data that do not align well with the filesystem's disk structure.
A simple solution would be to use a memory mapped file and let the OS asynchronously deal with actually committing the data to the file - that way it is likely to be optimal for the system you are running on without you having to deal with all the possible variables or work out the optimal write block size of your system.
Even with regular stream I/O you will improve performance drastically by writing to a RAM buffer. Making the buffer size a multiple of the block size of your file system is likely to be optimal. However since file writes may block if there is insufficient buffering in the file system itself for queued writes or write-back, you may not want to make the buffer too large if the data generation and the data write occur in a single thread.
Another solution is to have a separate write thread, connected to the thread generating the data via a pipe or queue. The writer thread can then simply buffer data from the pipe/queue until it has a "block" (again matching the file system block size is a good idea), then committing the block to the file. The pipe/queue then acts a buffer storing data generated while the thread is stalled writing to the file. The buffering afforded by the pipe, the block, the file system and the disk write-cache will likely accommodate any disk latency so long at the fundamental write performance of the drive is faster then the rate at which data to write is being generated - nothing but a faster drive will solve that problem.