8

For the educational purposes, I wanted to create file chunks upload. How do you guys know when all of the chunks are uploaded?

I tried to move chunks from temp and renaming them so they are in correct order, and then with the last chunk merge them together. However the last piece sent is not the last piece received, I guess. So fopen() on chunks fails since they're not created yet, and I get final file with the size exactly the size of the last chunk.

I believe I could send chunks one by one using .onload event on xhr, that way I wouldn't have to even move them from PHP temp, but I'm wondering if there are different solutions.

Some basic code to please you:

function upload(file) {
  var BYTES_PER_CHUNK = parseInt(2097152, 10),
  size = file.size,
  NUM_CHUNKS = Math.max(Math.ceil(SIZE / BYTES_PER_CHUNK), 1),
  start = 0, end = BYTES_PER_CHUNK, num = 1;

  var chunkUpload = function(blob) {
    var fd = new FormData();
    var xhr = new XMLHttpRequest();

    fd.append('upload', blob, file.name);
    fd.append('num', num);
    fd.append('num_chunks', NUM_CHUNKS);
    xhr.open('POST', '/somedir/upload.php', true);
    xhr.send(fd);
  }

  while (start < size) {
    chunkUpload(file.slice(start, end));
    start = end;
    end = start + BYTES_PER_CHUNK;
    num++;
  }
}

And PHP:

$target_path = ROOT.'/upload/';

$tmp_name = $_FILES['upload']['tmp_name'];
$filename = $_FILES['upload']['name'];
$target_file = $target_path.$filename;
$num = $_POST['num'];
$num_chunks = $_POST['num_chunks'];

move_uploaded_file($tmp_name, $target_file.$num);

if ($num === $num_chunks) {
  for ($i = 1; $i <= $num_chunks; $i++) {

    $file = fopen($target_file.$i, 'rb');
    $buff = fread($file, 2097152);
    fclose($file);

    $final = fopen($target_file, 'ab');
    $write = fwrite($final, $buff);
    fclose($final);

    unlink($target_file.$i);
  }
}
Coldark
  • 445
  • 4
  • 16

1 Answers1

8

Sorry for my previous comments, I misunderstood a question. This quiestion is interesting and fun to play with.

The expression you are looking for is this:

$target_path = ROOT.'/upload/';

$tmp_name = $_FILES['upload']['tmp_name'];
$filename = $_FILES['upload']['name'];
$target_file = $target_path.$filename;
$num = $_POST['num'];
$num_chunks = $_POST['num_chunks'];

move_uploaded_file($tmp_name, $target_file.$num);

// count ammount of uploaded chunks
$chunksUploaded = 0;
for ( $i = 1, i <= $num; $i++ ) {
    if ( file_exists( $target_file.$i ) ) {
         ++$chunksUploaded;
    }
}

// and THAT's what you were asking for
// when this triggers - that means your chunks are uploaded
if ($chunksUploaded === $num_chunks) {

    /* here you can reassemble chunks together */
    for ($i = 1; $i <= $num_chunks; $i++) {

      $file = fopen($target_file.$i, 'rb');
      $buff = fread($file, 2097152);
      fclose($file);

      $final = fopen($target_file, 'ab');
      $write = fwrite($final, $buff);
      fclose($final);

      unlink($target_file.$i);
    }
}

And this must be mentioned:

Point of fragility of my version - is when you expect files

  • 'tmp-1',

  • 'tmp-2',

  • 'tmp-3'

but, let's assume that after sending 'tmp-2' we were interrupted - that tmp-2 pollutes tmp folder, and it will interfere with future uploads with the same filename - that would be a sleeping bomb.

To counter that - you must find a way to change tmp to something more original.

  • 'tmp-ABCew-1',

  • 'tmp-ABCew-2',

  • 'tmp-ABCew-3'

is a bit better - where 'ABCew' could be called 'chunksSessionId' - you provide it when sending your POST, you make it randomly. Still, collisions are possible - as space of random names depletes. You could add time to equation - for example - you can see that

  • 'tmp-ABCew-2016-03-17-00-11-22--1',

  • 'tmp-ABCew-2016-03-17-00-11-22--2',

  • 'tmp-ABCew-2016-03-17-00-11-22--3'

Is much more collision-resistant but it is difficult to implement - a whole can of worms here - client date and time is controlled by client and could be spoofed - this data is unreliable.

So making tmp-name unique is a complex task. Designing a system that makes it reliable - is an interesting problem ^ ^ You can play with that.

Jaiden Snow
  • 852
  • 5
  • 5
  • Thank you, don't know why I didn't think of simply checking if files already exist. About temp names - yes, that's something I'm taking into account too. Well, the problem might occur only if user tries to upload two or more files with the same name at the same time, cause I use real file name as an identifier, but I think I'll use some sort of timestamp to identify them tough. Another way would be to create temporary folder for each chunk pile, and use some random name for it. Thank you for your great answer once again! – Coldark Mar 16 '16 at 22:01
  • What's the reason for the specific value 2097152 here `fread($file, 2097152)`? – Luca Reghellin Oct 16 '18 at 16:21
  • @LucaReghellin Sorry for such a late response, but in case someone else stumbles upon it - it is the amount of bytes to be read. Chunks are supposed to be 2MB so exactly 2097152 bytes. I guess you can just use `filesize($target_file.$i)` instead. – Coldark Mar 12 '19 at 16:14
  • As you mentioned about designing a system to handle file name, how about JS first send request for a Token and PHP sends back a Unique Token which can be used as Directory name for temporary holding up file chunks, then JS can send chunks with that Token attached to it so, PHP can check if Token (also Directory name) exists and can easily store Chunk in that directory. When all chunks are uploaded it can Create permanent file and delete Token (Directory) – Airy Aug 24 '22 at 16:55