0

iam using hadoop apache 2.7.1 on centos and iam new to centos

if i want to calc md5 checksum for specific file in hadoop i can issue the following command

hdfs dfs -cat /hadoophome/myfile | md5sum

but how if i want to calc md5 checksum for all files in hadoophome hdfs directory

i mean how to write a script that iterate through all files in /hadoophome which is specific hdfs directory and then write each filename plus it's md5 checksum in new line to one file containing all results

note: i'm forced to cat hdfs file then useing md5sum for that file and not

hadoop fs -checksum

because i want md5 value

i began with the following script

for i in $(hadoop fs -ls /hadoophome  | sed '1d;s/  */ /g' | cut -d\  -f8 ); do   hdfs dfs -cat  "$i"  | md5sum  ; done;
oula alshiekh
  • 103
  • 1
  • 2
  • 6

1 Answers1

0

You can use the find command to exec a command on each file found in a given directory and it's sub directories, and then redirect the output to another file:

# find /hadoophome -type f -exec md5sum "{}" \; >> /tmp/file-list.txt

The output looks like this:

# find /bin/ -type f -exec md5sum "{}" \; 
...snip...
2de30aeb16259b7051520d2c6c18b848  /bin/mlnx_dump_parser
e1f7d74a86c8fa85588e239f974a6d24  /bin/ibv_task_pingpong
9fbb31d5760f35911eeb644d99c615ab  /bin/mlnx_get_vfs.pl
9f43d9718c5e41727a6520080158b494  /bin/flint_ext
2f315aa63072d96718e7fe268643869c  /bin/mlnx_perf
f31173018f34839e24d5ecf25c811a30  /bin/fwtrace
361cb80244b429f4df29ea2555eee134  /bin/mlnx_qcn
c17cd67a2e996881d9157ec30b7b215f  /bin/mdevices_info
49f03faf85a80d54eedea5ef69358f01  /bin/mlnx_qos
...snip...
Matt Kereczman
  • 1,899
  • 9
  • 12