1

I want to programmatically check the integrity of the files that I'm copying to a shared folder, this is part of a set of steps that are executed by an automation tool so I'm looking for the best (most elegant) way to make it more reliable without involving manual operations.

About the shared folder: It's a storage server and I can't deploy any of my stuff there. Would it be interesting to consider contacting someone that owns the server so they can provide a the info I need (i.e., the SHA-256 hash) ? Any alternatives?

the_marcelo_r
  • 1,847
  • 22
  • 35
  • 2
    The most reliable way to check the integrity would be by comparing file hashes. Can you somehow get the sha-xxx hash of the remote files for comparison? – aurbano Apr 09 '14 at 11:52
  • Exactly, that's my question! I wanted to gather some tangible ideas/best practices for it. – the_marcelo_r Apr 09 '14 at 12:04
  • Well, what kind of access do you have to the remote machine? You basically need to get the hashes of the files that you uploaded, so that you can compare them on the local machine. – aurbano Apr 09 '14 at 12:10

2 Answers2

1

If you are allowed to download the files you uploaded, check the integrity on your machine, by computing and checking the hashes. Or does downloading take to long to do that?

UPDATE:

I think we need some preconditions or assumptions we can rely on to discuss a proper solution. May we rely on this?

  • it's is very likely that an upload is not erroneous (succeeds with a probability of p)
  • it's is very likely that a download is not erroneous (succeeds with a probability of q)

Hence, uploading a file and downloading it again should succeed with a probability of p*q. If p*q is very high and the cost for downloading an uploaded file is low, checking the hashes on the local machine is suitable, right?

Harmlezz
  • 7,972
  • 27
  • 35
  • What if some anomaly manifests itself during the upload but not on the download phase, then how can I guarantee that the uploaded file is valid? – the_marcelo_r Apr 09 '14 at 12:03
  • This you can't. Consider the upload of the file is ok, but the upload of the hash is erroneous. Or the upload of the file and the upload of the hash is corrupted, but now matches (which is very unlikely but possible), or the download of the uploaded file is corrupted, but the upload was ok, and so on and so on. But if it is very likely that an upload is not corrupted and the same is true for a download, you should be fine in most of the cases. And if the verification fails, just do it again, if it is not too costly. – Harmlezz Apr 09 '14 at 12:08
  • for most of the operations, it involves a set of small files: html, css, js, jsps, etc.; But there's an infrequent need to move an .ear package which is a very heavy file (i.e., we don't want to move this stuff around the network twice). But I get your point, your approach would give me some reference about the stability of the network.. – the_marcelo_r Apr 09 '14 at 12:21
1

Other than the checks which are done in sftp or https protocol Apache Commons VFS does not provide support for this. If your server does not cooperate uploading extra files like .sha1 or .md5 (or PGP .asc signatures) is common practice. For VFS this is just a second file to up/download.

eckes
  • 10,103
  • 1
  • 59
  • 71