-1

We have an AIX server and have curl 7.40 installed on it. We are using shell scripts and curl to download files from AWS and the transfer protocol is FTPS.

As part of our design, we have to download the files and perform checksum / md5 or some integrity check before we move to next step which is deleting these file from remote server (AWS).

I am struggling to think of a process or method to perform checksum or any integrity check. Checksum is important so that we are sure that the download was clean and complete and there was no data loss before we proceed to next step which is deletion.

I have copied the code snippet that we have written to list the files and download them:

#list files on AWS server
curl -tlsv1.2 --cacert certs.cer --ftp-ssl --ftp-skip-pasv-ip --list-only --cert ${CERT_USER}:${CERT_USER_PWD} ftp://${USER_NAME}:${USER_PWD}@awscloud.com:50021/Inbox/* > ${LOGS}

#download the files mentioned in the list file
cat ${LOGS} | \
awk '{print $1}' | \
while read FILENAME
do
 echo "File being copied currently is ${FILENAME} "
 curl -tlsv1.2 --cacert certs.cer --ftp-ssl --ftp-skip-pasv-ip --cert ${CERT_USER}:${CERT_USER_PWD} ftp://${USER_NAME}:${USER_PWD}@awscloud.com:50021/Inbox/${FILENAME} -o ${FILENAME}
done

I was searching in the internet and came across the --head parameter. We tried this in our code and it displays the file size, however when you have hundreds of files to download, it would be difficult to cut and compare the output of the command in an automated solution

curl -tlsv1.2 --cacert certs.cer --ftp-ssl --ftp-skip-pasv-ip --head --cert ${CERT_USER}:${CERT_USER_PWD} ftp://${USER_NAME}:${USER_PWD}@awscloud.com:50021/Inbox/TEXT_FILE

Output:
 Last-Modified: Wed, 09 Aug 2017 07:58:07 GMT
 Content-Length: 1379
 Accept-ranges: bytes
halfer
  • 19,824
  • 17
  • 99
  • 186
sasen7
  • 1
  • 1
  • What is the content of the LOGS file? Okay, the first column is the file-name, are there more columns in the file? – Zsigmond Lőrinczy Aug 25 '17 at 18:21
  • @ZsigmondLőrinczy, Hi, LOGS is a temporary file that we are creating to hold the file names available in the remote (AWS direct) server. It has only 1 column and includes the file names. – sasen7 Aug 26 '17 at 13:48

1 Answers1

0

Depending on the size of the file, probably you could use the S3 ETag, see this answer: https://stackoverflow.com/a/19896823/1135424

check also this:

https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/

nbari
  • 25,603
  • 10
  • 76
  • 131