0

I'm trying to compute checksum of a transferred file. Traditional way is to receive and write the file to disk and then read again from disk and compute checksum. Alternatively, I can write and read simultaneously to optimize the process. I observed that if I write and read concurrently it finishes faster since read operations are not going to disk as a results of increased cache hits. However, I am worried whether or not my checksum calculation is still reliable since I think one of the reason for checksum calculation is to detect disk write errors? If so, would concurrently writing and reading be missing disk write errors?

            FileOutputStream fos = new FileOutputStream("testwrite.jpg");
            InputStream is = Files.newInputStream(Paths.get("testwrite.jpg"));

            MessageDigest md = null;
            try {
                md = MessageDigest.getInstance("MD5");
            } catch (NoSuchAlgorithmException e) {
                e.printStackTrace();
            }
            DigestInputStream dis = new DigestInputStream(is, md);

            byte[] bufferWrite = new byte[4096];
            byte[] bufferRead = new byte[4096];
            long current = 0L;

            long startTime = System.currentTimeMillis();
            while (current < totalWriteSize) {
                    fos.write(bufferWrite, 0, 4096);
                    fos.flush();
                    dis.read(bufferRead);
                    current += 4096;
            }
            fos.close();
earslan
  • 1
  • 3

1 Answers1

0

As you say, if the objective is to check that the disk write was correct, then getting the read back from a cache doesn't check that. What's more, the same thing can happen when you write the whole thing to the disk and read it back. You may still be getting data back from a cache in the operating system and/or the drive itself. In order to really check what was written on the disk, you would need to somehow clear out all of those caches before the read back.

There are calls in some operating systems to flush all of the pending write data out to the disk, and to further request that the drive flush its write buffers as well, but I am not aware of calls that also clear all of the data caches so that a read back forces a physical read of the disk. The only reliable thing I can think of to clear out all of those caches is to shutdown the system and remove power from the drive.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Thanks Mark for the response but shutting down the system sounds impractical. Do you, by any chance know, how is traditional approach prevent reading from cache? – earslan Jan 05 '18 at 20:29
  • You haven't said what you're using. I suggest you google for your system. In Linux there is an `O_DIRECT` option to `open()` that _may_ do what you want. I'd wonder how you'd come up with a test to prove that you have found a way to read from the physical drive. – Mark Adler Jan 05 '18 at 20:39