5

The PHP's crc32 support string as input.And For a file , below code will work OFC.

crc32(file_get_contents("myfile.CSV"));

But if file goes huge (2 GB) it might raise out of memory Fatal error.

So any way around to find checksum of huge files ?

Arshdeep
  • 4,281
  • 7
  • 31
  • 46

3 Answers3

6

PHP doesn't support files larger than 2GB (32bit limitation)

And more efficient way to calculate crc32 from files:

$hash = hash_file('crc32b',"myfile.CSV" );
dev-null-dweller
  • 29,274
  • 3
  • 65
  • 85
2

This function in the User Contributed Notes to crc32() claims to calculate the value without loading the file in full. If it works correctly, it should eliminate any memory problems.

For a file larger than 2 GB, it is however likely to stop at the same 32-bit limitation you are encountering right now.

If possible, I would invoke an external tool that can calculate the checksum for files as large as the one at hand.

NullUserException
  • 83,810
  • 28
  • 209
  • 234
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
0

dev-null-dweller's answer is IMO the way to go.

However, for those who are looking for a memory-efficient PHP4 backport of hash_file('crc32b', $filename);, here is a solution based on this PHP manual comment, with some improvements:

  • It now gives exactly the same results than hash_file()
  • It supports 32 bit & 64 bit architectures.

Warning: perfs are ugly. Trying to improve.

Note: I've tried a solution based on the C source code from zaf's comment, but I could not quickly enough succeed to port it to PHP.

if (!function_exists('hash_file'))
{
    define('CRC_BUFFER_SIZE', 8192);

    function hash_file($algo, $filename, $rawOutput = false)
    {
        $mask32bit = 0xffffffff;

        if ($algo !== 'crc32b')
        {
            trigger_error("Unsupported hashing algorightm '".$algo."'", E_USER_ERROR);
            exit;
        }

        $fp = fopen($filename, 'rb');

        if ($fp === false)
        {
            trigger_error("Could not open file '".$filename."' for reading.", E_USER_ERROR);
            exit;
        }

        static $CRC32Table, $Reflect8Table;
        if (!isset($CRC32Table))
        {
            $Polynomial = 0x04c11db7;
            $topBit = 1 << 31;

            for($i = 0; $i < 256; $i++)
            {
                $remainder = $i << 24;
                for ($j = 0; $j < 8; $j++)
                {
                    if ($remainder & $topBit)
                        $remainder = ($remainder << 1) ^ $Polynomial;
                    else
                        $remainder = $remainder << 1;

                    $remainder &= $mask32bit;
                }

                $CRC32Table[$i] = $remainder;

                if (isset($Reflect8Table[$i]))
                    continue;
                $str = str_pad(decbin($i), 8, '0', STR_PAD_LEFT);
                $num = bindec(strrev($str));
                $Reflect8Table[$i] = $num;
                $Reflect8Table[$num] = $i;
            }
        }

        $remainder = 0xffffffff;
        while (!feof($fp))
        {
            $data = fread($fp, CRC_BUFFER_SIZE);
            $len = strlen($data);
            for ($i = 0; $i < $len; $i++)
            {
                $byte = $Reflect8Table[ord($data[$i])];
                $index = (($remainder >> 24) & 0xff) ^ $byte;
                $crc = $CRC32Table[$index];
                $remainder = (($remainder << 8) ^ $crc) & $mask32bit;
            }
        }

        $str = decbin($remainder);
        $str = str_pad($str, 32, '0', STR_PAD_LEFT);
        $remainder = bindec(strrev($str));
        $result = $remainder ^ 0xffffffff;
        return $rawOutput ? strrev(pack('V', $result)) : dechex($result);
    }
}
Maxime Pacary
  • 22,336
  • 11
  • 85
  • 113
  • This is interesting. I guess there should be a way to hash a file block by block using the native crc32() method. I remember from implementing a crc16 function (long ago), that crc16(data.crc16(data)) == 0x0000. If there is a way to guess what data to prepend to the next block in order to obtain last block's crc32, you're good to go. In other words, if you can efficiently calculate a 32-bit value x from crc32(block1) such that crc32(x)=crc32(block1), then crc32(block1.block2)=crc32(x.block2). Anyone has an idea whether this would be feasible? – Bigue Nique Jun 12 '14 at 19:23
  • Even better, I found [this post](http://php.net/manual/en/function.crc32.php#100060) that lets think that you can come up with a function crc32_combine() such that crc32_combine(crc32(block1),crc32(block2)) = crc32(block1.block2). Unfortunately, the original post it's referring to is gone! Open source Zlib has an implementation of crc32_combine() we could peek at, though. I'll try digging this up. – Bigue Nique Jun 12 '14 at 19:42