5

I'm trying to use dropzone js with flask as backend. Dropzone configuration:

<form method="POST" action='/process_chunk' class="dropzone dz-clickable"
     id="dropper" enctype="multipart/form-data">
</form>

<script type="application/javascript">
   Dropzone.options.dropper = {
       {# https://gitlab.com/meno/dropzone/wikis/faq#chunked-uploads #}
       paramName: 'file',
       acceptedFiles: '.csv',
       chunking: true,
       forceChunking: true,
       chunkSize: 100000, // bytes
       parallelChunkUploads: true,
       maxFilesize: 1025, // megabytes
</script>

and my flask backend look as following

@app.route('/process_chunk', methods=['POST'])
def process_chunk():
    current_chunk = int(request.form['dzchunkindex'])

    file = request.files['file']
    save_path = os.path.join(app.config['DATA_DIR'], file.filename)

    try:
        with open(save_path, 'ab+') as f:
            # Goto the offset, aka after the chunks we already wrote
            f.seek(int(request.form['dzchunkbyteoffset']))
            f.write(file.stream.read())
    except OSError:
        # log.exception will include the traceback so we can see what's wrong
        log.exception('Could not write to file')
        return make_response(("Couldn't write the file to disk", 500))

    total_chunks = int(request.form['dztotalchunkcount'])

    if current_chunk + 1 == total_chunks:
        # This was the last chunk, the file should be complete and the size we expect
        if os.path.getsize(save_path) != int(request.form['dztotalfilesize']):
            log.error(f"File {file.filename} was completed, "
                      f"but has a size mismatch."
                      f"Was {os.path.getsize(save_path)} but we"
                      f" expected {request.form['dztotalfilesize']} ")
            return make_response(('Size mismatch', 500))
        else:
            log.info(f'File {file.filename} has been uploaded successfully')
    else:
        log.debug(f'Chunk {current_chunk + 1} of {total_chunks} '
                  f'for file {file.filename} complete')

    return make_response(("Chunk upload successful", 200))

This work fine if I set parallelChunkUploads to false. Chunks uploading one by one, and resulting file look ok. For example, I'm use small file (409bytes) and set chunk size to 50 bytes:

serial chunk upload.

Result file look exactly as input file. When I set parallelChunkUploads to to true, chunks uploading in parallel:

parallel chunk upload

but result file fully messed:

original file

cod,char,xx
01,aaaa,xx
02,bbbb,xx
03,cccc,xx
04,dddd,xx
05,eeee,xx
06,ffff,xx
07,gggg,xx
08,iiii,xx
09,kkkk,xx
10,llll,xx
11,mmmm,xx
12,gerf,xx
13,flrg,xx
14,erge,xx
15,lkro,xx
16,ergf,xx
17,kiwu,xx
18,erjg,xx
19,hytj,xx
20,utkj,xx
21,rger,xx
22,ehth,xx
23,kmik,xx
24,ergb,xx
25,ergk,xx
26,egeg,xx
27,ejer,xx
28,gtrh,xx
29,thrh,xx
30,rhtr,xx
31,gtrh,xx
32,thrh,xx
33,rhtr,xx

uploaded file

cod,char,xx
01,aaaa,xx
02,bbbb,xx
03,cccc,xx
0iiii,xx
09,kkkk,xx
10,llll,xx
11,mmmm,xx
12,gerf,xx
13,flrg,xx
14,erge,xx
15,lkro,xx
16,ergf4,dddd,xx
05,eeee,xx
06,ffff,xx
07,gggg,xx
08,x
21,rger,xx
22,ehth,xx
23,kmik,xx
24,ergb,xx
,xx
17,kiwu,xx
18,erjg,xx
19,hytj,xx
20,utkj,x9,thrh,xx
30,rhtr,xx
31,gtrh,xx
32,thrh,xx
33,rhtr,xx

25,ergk,xx
26,egeg,xx
27,ejer,xx
28,gtrh,xx
2

Last chunk is red because front gets 'Size mismatch', 500 response, because file size after upload last chunk is different. Do y'all have any idea how to fix it?

swasher
  • 368
  • 4
  • 17

1 Answers1

0

i do think that your problem is your are writing the file from multiple processes in an unsafe way ( not thread safe) you need to enclose the access of your share resource ( file ) on a critical section. you could achieve this using a lock file https://py-filelock.readthedocs.io/en/latest/ regards