0

I have a Python code that does some computation and writes the data in C.csv file. The computation is huge and takes time.

So while my program is running, I want to check if data is being written or not using:

$ less +F C.csv

What I notice is that while my program is running, I do not see any output being written in C.csv but as soon as I give CTRL+C signal, all of a sudden a lot of entries appear in C.csv file.

Now, I know that disk I/O are generally buffered and probably the program will wait for the buffer to get full before it actually writes it to the file (this is my assumption). So, I googled how to check the size of Buffer which suggested me the following method:

import io
print (io.DEFAULT_BUFFER_SIZE)

This returns 8192 (bytes) in my machine. I thought writes will only happen when the data to be written does not fit into buffer, i.e., when the data size crosses 8192 bytes. But when I check the size of C.csv after CTRL+C, it shows 236540 bytes.

How did so much data fit into the buffer? Or is there something else happenning?

Shubham
  • 2,847
  • 4
  • 24
  • 37
  • Programs generally have internal buffers before attempting a write to disk, the OS might have its own buffers before then bothering to write unless the source program force a flush operation somehow (or if it's killed as you're doing now)... If both files are only 16mb, you're better off loading them both into memory and joining them there using a `dict`, then writing them out instead of an O(N^2) approach – Jon Clements Nov 24 '18 at 20:46
  • @JonClements Can you say a bit more about those internal buffers and when should I expect the data to be written on disks? What should I do if I want to monitor it on real time? Also the `O(n^2)` was just an example, the two files don't actually have any common `id` like column, instead I had to do some computation to figure out if they are related. I gave the `O(n^2)` thing to just paint a general scenario. – Shubham Nov 24 '18 at 20:54
  • Depends on your system and operating system... so nope - I can't give you more details... you can try flushing after each write to force it through, but apart from that, the OS is going to write to disk when it wants to. – Jon Clements Nov 24 '18 at 20:57

0 Answers0