0

I'm writing a GUI program that synchronizes files in a folder with a server. The information I know about these files is that they're always written and not removed. My concern is to start uploading a file while it's being written. So to avoid this, I invented a way to solve the problem, and I need some expert to tell me whether this is wrong.

So what I do is that I have an event loop with a timer. Every time this timer ticks, it looks whether there are new files added. If new files are found, I use this simple function to get the file size:

std::size_t GetFileSize(const std::string &filename)
{
    std::ifstream file(filename.c_str(), std::ios::binary | std::ios::ate);
    return file.tellg();
}

Then, I store the new file(s) name, size in a data structure of the form (ignoring std:: to make it visually friendly as there are 5 to be written in the next line):

deque<pair<string, pair<size_t, long> > fileMonitor;

(please suggest a better data structure if possible. unordered_multimap seems to do a similar job).

So this will store the file name (in that string), its size (in that size_t) and the number of times the size of the file was checked without a change, let's call it checks. So every time the timer ticks, I look for new files, and check whether the size of the files in fileMonitor has changed. For a single file, if the file size is different than before, then checks = 1, and if the file size is the same, then I do checks++.

Now in each iteration, I check if the the timer's interval*checks > timeout, then the file hasn't change for a long enough time, where I can judge that the file is stable and not being updated.

Obvious question: Why don't I use something like inotify? Because I need something cross platform and simple in structure, as I already know the behavior of the files I'm gonna upload. Unfortunately boost doesn't provide a solution for this, so I had to invent my own.

The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189
  • Does your `GetFileSize` actually work? Personally I would've called `stat` to get the size and last modification time. – melpomene Jul 18 '15 at 08:28
  • @melpomene I'm sorry, what do you mean with "actually works"? Do you mean works while updating the file size or normally works for any file? It works for regular files with no problems. – The Quantum Physicist Jul 18 '15 at 08:29

1 Answers1

0

Do you have access to the writing program ? In that case I would recommend to first write the data into a temporary file and only rename it after writing has been finished (kind of an atomic operation on a file system). Otherwise your "wait an appropriately long time for a change" approach always has the potential to fail because you can not tell what might be the reason for the writing program to not change the file for a long time.

  • Additions for HD5 Format:

Files may even change content without changing its size but:

From the https://www.hdfgroup.org/HDF5/doc/H5.format.html#FileMetaData

File Consistency Flags

This value contains flags to indicate information about the consistency of the information contained within the file. Currently, the following bit flags are defined:

Bit 0 set indicates that the file is opened for write-access.
Bit 1 set indicates that the file has been verified for consistency and is guaranteed to be consistent with the format defined

in this document. Bits 2-31 are reserved for future use.

Bit 0 should be set as the first action when a file is opened for write access and should be cleared only as the final action when closing a file. Bit 1 should be cleared during normal access to a file and only set after the file's consistency is guaranteed by the library or a consistency utility.

I would assume that hd5 APIs provide methods to exclusively open these files and would try it in addition to your polling approach.

Oncaphillis
  • 1,888
  • 13
  • 15