-1

When I use hdfs dfs -checksum /file.txt in terminal, it gives /file.txt MD5-of-0MD5-of-512CRC32C 000002000000000000000000ccfadcfdcff630efa5628fb72620d535

How it was calculated?

Upto my understanding, crc-32 used to calculate the checksum of the file.

How crc-32 calculates the checksum value?

tevemadar
  • 12,389
  • 3
  • 21
  • 49

1 Answers1

0

It´s not CRC32 it is CRC32C which is used. It´s calculated parallel using 3 streams and so it´s faster than CRC32. Also the used polymonals are different for each process (0x1EDC6F41, reversed 0x82F63B78). The used algorythm is still the same like CRC32.

If you want to see the algorythm - have a look here on WIKI

LenglBoy
  • 1,451
  • 1
  • 10
  • 24
  • A commonly used error-detecting code is CRC-32 (32-bit cyclic redundancy check), which computes a 32-bit integer checksum for input of any size. CRC-32 is used for checksumming in Hadoop’s ChecksumFileSystem, while HDFS uses a more efficient variant called CRC-32C. what is the diiference? – CrazyMinion Apr 16 '18 at 10:19
  • Calculating parallel, other polynoms - both got 32bit length and same algorythm - but other variables for polynoms - just like I told you. So you can´t use CRC-32 if they want you to use CRC32C. Thats it. – LenglBoy Apr 16 '18 at 10:30