3

Question in short

How can one "resume hash_context" in PHP?


Background & Current Situation

The software's goal is to receive big file chunk by chunk (synchronously), calculate both MD5 and SHA1 of that file and generate a download link (of the full file). Something like rapidshare but instead of sending the file fully, sending the file chunk by chunk.

Currently the software is working with this logic:
It's receiving file chunks (10MB chunks of a big file) synchronously per file session. After receiving all the chunks I need to calculating the MD5 and SHA1 of a file which takes very long time for files over 1GB.

Pseudo code for file finalizer (when all chunks are received):

$fileKey = $_GET['KEY'];
$ctxMd5 = hash_init('md5');
$ctxSha1 = hash_init('sha1');

$fh = fopen('file/containing/all_chunks.tmp', 'r');
while(!feof($fh)) {
$data = fread($fh, CHUNK_SIZE);
    hash_update($ctxMd5, $data);
    hash_update($ctxSha1, $data);
}
$md5 = hash_final($ctxMd5);
$sha1= hash_final($ctxSha1);

saveFileHashes($fileKey, $md5, $sha1);

Problem is that when all chunks are uploaded user has to wait until the script calculates both hashes which is very frustrating.


The Solution to the Problem

I would like to change the receive logic this way:
Instead of calculating hashes when all the chunks are received and saved, I would like to resume or create new hashing context, increment the context, save the hashing context state and save the file chunk, When each chunk is received.

Pseudo code for chunk receiver:

$chunkData = getIncommingChunkData();
$fileKey = $_GET['KEY'];

$ctxMd5 = resumeMd5HasingContext($fileKey);
$ctxSha1 = resumeSha1HasingContext($fileKey);

hash_update($ctxMd5, $chunkData);
hash_update($ctxSha1, $chunkData);

saveMd5HashingContext($fileKey, $ctxMd5)
saveSha1HashingContext($fileKey, $ctxSha1)

appendFileChunk($fileKey, $chunkData);

The Problem

The main problem is that PHP resources are not serializable, neither does the hash_init provide a way of resuming the context.

I would like to know how to achieve all stated above?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
George
  • 1,466
  • 3
  • 12
  • 30
  • I dont'know how you can serialize the context to be used by another process, but the [hash_copy](https://www.php.net/manual/en/function.hash-copy.php) and [hash_update](https://www.php.net/manual/en/function.hash-update.php) functions let you save and resume a hashing context. – Enrico Dias Apr 27 '19 at 00:03

1 Answers1

1

Just an idea to evercome the problem: maybe you should separate the reception process from the concatenation/hashing process.

When you initialize the transfer, your script could start a persistent script that runs in the background, waits for the chunks, calculates the hashes on each chunk that become available, appends them to the file and exits when all chunks are received, all in a single execution.

Your reception script would simply move the uploaded chunk files to a temporary directory to make them available to the persistent process.

Bigue Nique
  • 381
  • 2
  • 5
  • The question is how to resume the context, not about finding a workaround to the problem. Don't get me wrong, your solution is valid and it could be used, it's just that I was interested in a way to resume the context not finding a different approach. – George Aug 28 '18 at 23:03