0

I have a string composed of some packet statistics, such as packet length, etc.

I would like to store this to a csv file, but if I use the standard fprintf to write to a file, it writes incredibly slowly, and I end up losing information.

How do I write information to a file as quickly as possible in order to minimize information loss from packets. Ideally I would like to support millions of packets per second, which means I need to write millions of lines per second.

I am using XDP to get packet information and send it to the userspace via an eBPF map if that matters.

Qeole
  • 8,284
  • 1
  • 24
  • 52
  • 1
    Can you write larger chunks to the file? Instead of writing a single value write 100s or 1000s? Maybe use a double buffer: write new values to one while the contents of the other are being written to the file. – Fiddling Bits Jan 14 '20 at 19:41
  • Do you close the stream between two consecutive writes? – Roberto Caboni Jan 14 '20 at 19:48
  • 1
    "...millions of packets per second..." tells us nothing given that these are strings of undisclosed length. How long are these strings? Also what is the _burst length_ - does this data steam indefinitely or could it be buffered and committed during "quiet" periods. – Clifford Jan 14 '20 at 19:51
  • (Do you understand that it is difficult to provide help without your code?) – Roberto Caboni Jan 14 '20 at 19:53
  • You realise that asking for the "fastest way" is seldom productive or necessary. You only need it to be _fast enough_. By specifying the actual performance requirement, you increase the number of possible solutions ranging from those that are "fast enough" to those that might be "fastest". A sub-optimal but adequate solution might be preferable for other reasons such as portability or simplicity. So specify the performance requirement in a quantifiable manner, and describe the hardware it must run. – Clifford Jan 14 '20 at 20:29
  • 1
    `fprintf` is not your bottleneck. It's possible that file I/O is. If the device to which you are writing the log cannot keep up with the incoming packets, no software fix will help. Perhaps you'll want to log to an SSD, or even create a ram disk? – Lee Daniel Crocker Jan 14 '20 at 21:00
  • `fprintf` is already buffered, and is probably much faster than anything you can manage to come up with yourself. Naturally this assumes that it is used correctly, which is hard say unless you show us your code. – HAL9000 Jan 14 '20 at 22:49

2 Answers2

3

The optimal performance will depend on the hard drive, drive fragmentation, the filesystem, the OS and the processor. But optimal performance will never be achieved by writing small chunks of data that do not align well with the filesystem's disk structure.

A simple solution would be to use a memory mapped file and let the OS asynchronously deal with actually committing the data to the file - that way it is likely to be optimal for the system you are running on without you having to deal with all the possible variables or work out the optimal write block size of your system.

Even with regular stream I/O you will improve performance drastically by writing to a RAM buffer. Making the buffer size a multiple of the block size of your file system is likely to be optimal. However since file writes may block if there is insufficient buffering in the file system itself for queued writes or write-back, you may not want to make the buffer too large if the data generation and the data write occur in a single thread.

Another solution is to have a separate write thread, connected to the thread generating the data via a pipe or queue. The writer thread can then simply buffer data from the pipe/queue until it has a "block" (again matching the file system block size is a good idea), then committing the block to the file. The pipe/queue then acts a buffer storing data generated while the thread is stalled writing to the file. The buffering afforded by the pipe, the block, the file system and the disk write-cache will likely accommodate any disk latency so long at the fundamental write performance of the drive is faster then the rate at which data to write is being generated - nothing but a faster drive will solve that problem.

Clifford
  • 88,407
  • 13
  • 85
  • 165
1

Use sprintf to write to a buffer in memory.
Make that buffer as large as possible, and when it gets full, then use a single fwrite to dump the entire buffer to disk. Hopefully by that point it will contain many hundreds or thousands of lines of CSV data that will get written at once while you begin to fill up another in-memory buffer with more sprintf.

abelenky
  • 63,815
  • 23
  • 109
  • 159