2

We take text/csv like data over long periods (~days) from costly experiments and so file corruption is to be avoided at all costs.

Recently, a file was copied from the Explorer in XP whilst the experiment was in progress and the data was partially lost, presumably due to multiple access conflict.

What are some good techniques to avoid such loss? - We are using Delphi on Windows XP systems.

Some ideas we came up with are listed below - we'd welcome comments as well as your own input.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Brendan
  • 18,771
  • 17
  • 83
  • 114

8 Answers8

9

Use a database as a secondary data storage mechanism and take advantage of the atomic transaction mechanisms

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Brendan
  • 18,771
  • 17
  • 83
  • 114
  • A database would provide other benefits and could be used for generating reports. – stukelly Nov 15 '08 at 16:35
  • Introducing dependencies on another complex system is a really bad idea. Experiment handling computers should not have software on them with unpredictable processor and disk load characteristics. – Stephan Eggermont Nov 16 '08 at 14:41
  • Stephen has a valid point - one thing that I should emphasise is that the computers here are unlikely to have a professional 'administering' them, I was reluctant to implement a database because it adds a whole new layer. KISS seems to have a lot of weight in this environment... – Brendan Nov 16 '08 at 17:02
6

How about splitting the large file into separate files, one for each day.

stukelly
  • 4,257
  • 3
  • 37
  • 44
1

If these machines are on a network: send a HTTP post with the logging data to a webserver. (sending UDP packets would be even simpler).

Make sure you only copy old data. If you have a timestamp on the filename with a 1 hour resolution, you can safely copy the data older than 1 hour.

Stephan Eggermont
  • 15,847
  • 1
  • 38
  • 65
  • UDP provides an unreliable service and datagrams may arrive out of order, appear duplicated, or go missing without notice. (wikipedia) – mjn Nov 11 '10 at 07:39
  • Which is not an issue. It's a local network. With the kind of problems you'll have there (people unconnecting network cables or removing power from the router) HTTP will not be an improvement over UDP. – Stephan Eggermont Nov 14 '10 at 20:10
0

If a write fails, cache the result for a later write - so if a file is opened externally the data is still stored internally, or could even be stored to a disk

Brendan
  • 18,771
  • 17
  • 83
  • 114
  • If the write fails for a long period, then you loose the data stored in memory if the computer losses power. – stukelly Nov 15 '08 at 16:38
  • @stukelly, this is true however these ideas are not 'either/or' I envisage implementing more than one of them to make the system more robust. – Brendan Nov 16 '08 at 17:06
0

I think what you're looking for is the Win32 CreateFile API, with these flags:

FILE_FLAG_WRITE_THROUGH : Write operations will not go through any intermediate cache, they will go directly to disk.

FILE_FLAG_NO_BUFFERING : The file or device is being opened with no system caching for data reads and writes. This flag does not affect hard disk caching or memory mapped files. There are strict requirements for successfully working with files opened with CreateFile using the FILE_FLAG_NO_BUFFERING flag, for details see File Buffering.

PatrickvL
  • 4,104
  • 2
  • 29
  • 45
0

Each experiment much use a 'work' file and a 'done' file. Work file is opened exclusively and done file copied to a place on the network. A application on the receiving machine would feed that files into a database. If explorer try to move or copy the work file, it will receive a 'Access denied' error.

'Work' file would become 'done' after a certain period (say, 6/12/24 hours or what ever period). So it create another work file (the name must contain the timestamp) and send the 'done' through the network ( or a human can do that, what is you are doing actually if I understand your text correctly).

Copying a file while in use is asking for it being corrupted.

Fabricio Araujo
  • 3,810
  • 3
  • 28
  • 43
-1

Write data to a buffer file in an obscure directory and copy the data to the 'public' data file periodically (every 10 points for instance), thereby reducing writes and also providing a backup

Brendan
  • 18,771
  • 17
  • 83
  • 114
  • Security through obscurity is rarely a good thing. You'll end up spending a lot of time positioning yourself for new writes to the public file if it very large. – tvanfosson Nov 15 '08 at 16:33
-1

Write data points discretely, i.e. open and close the filehandle for every data point write - this reduces the amount of time the file is being accessed provided the time between data points is low

Brendan
  • 18,771
  • 17
  • 83
  • 114
  • You're just narrowing the window of opportunity, not eliminating it. You'll also have to deal with seeking to the end of the data each time to append. – tvanfosson Nov 15 '08 at 16:34
  • @tvanfosson: on a civilized system (of which XP might not be an example), you can open the file for append so that each write is automatically at the end of the file. – Jonathan Leffler Nov 16 '08 at 05:22