iam using hadoop apache 2.7.1 on centos and iam new to centos
if i want to calc md5 checksum for specific file in hadoop i can issue the following command
hdfs dfs -cat /hadoophome/myfile | md5sum
but how if i want to calc md5 checksum for all files in hadoophome hdfs directory
i mean how to write a script that iterate through all files in /hadoophome which is specific hdfs directory and then write each filename plus it's md5 checksum in new line to one file containing all results
note: i'm forced to cat hdfs file then useing md5sum for that file and not
hadoop fs -checksum
because i want md5 value
i began with the following script
for i in $(hadoop fs -ls /hadoophome | sed '1d;s/ */ /g' | cut -d\ -f8 ); do hdfs dfs -cat "$i" | md5sum ; done;