6

I try to upload a big file (4GB) with a PUT on a DRF viewset.

During the upload my memory is stable. At 100%, the python runserver process takes more and more RAM and is killed by the kernel. I have a logging line in the put method of this APIView but the process is killed before this method call.

I use this setting to force file usage FILE_UPLOAD_HANDLERS = ["django.core.files.uploadhandler.TemporaryFileUploadHandler"]

Where does this memory peak comes from? I guess it try to load the file content in memory but why (and where)?

More information:

  • I tried DEBUG true and false
  • The runserver is in a docker behind a traefik but there is no limitation in traefik AFAIK and the upload reaches 100%
  • I do not know yet if I would get the same behavior with daphne instead of runserver
  • EDIT: front use a Content-Type multipart/form-data
  • EDIT: I have tried FileUploadParser and (FormParser, MultiPartParser) for parser_classes in my APIView
Benjamin
  • 3,350
  • 4
  • 24
  • 49
  • it holds it in memory because it needs it all before it can store the file to storage. you can't directly stream from client to file storage via drf. are you running this locally? if so, try and increase allocated memory for your docker container. – Dap Oct 08 '20 at 17:18
  • @dap I use a multipart-form in the front to "stream" it. Is it a django limitation or a DRF one? – Benjamin Oct 08 '20 at 17:42

2 Answers2

6

TL;DR:

Neither a DRF nor a Django issue, it's a 2.5 years known Daphne issue. The solution is to use uvicorn, hypercorn, or something else for the time being.

Explanations

What you're seeing here is not coming from Django Rest Framework as:

The fact that you're mentioning Daphne reminds me of this SO answer which mentions a similar problem and points to a code that Daphne doesn't handle large file uploads as it loads the whole body in RAM before passing it to the view. (The code is still present in their master branch at the time of writing)

You're seeing the same behavior with runserver because when installed, Daphne replaces the initial runserver command with itself to provide WebSockets support for dev purposes.

To make sure that it's the real culprit, try to disable Channels/run the default Django runserver and see for yourself if your app is killed by the OOM Killer.

Wonskcalb
  • 383
  • 2
  • 6
0

I don't know if it works with django rest, but you can try to chunk de file.

        [...]
        anexo_files = request.FILES.getlist('anexo_file_'+str(k))
        index = 0
        for file in anexo_files:
            index = index + 1
            extension = os.path.splitext(str(file))[1]
            nome_arquivo_anexo = 'media/uploads/' + os.path.splitext(str(file))[0] + "_" + str(index) + datetime.datetime.now().strftime("%m%d%Y%H%M%S") + extension
            handle_uploaded_file(file, nome_arquivo_anexo)
            
            AnexoProjeto.objects.create(
                projeto=projeto,
                arquivo_anexo = nome_arquivo_anexo 
            )
        [...]

Where handle_uploaded_file is

def handle_uploaded_file(f, nome_arquivo):
    with open(nome_arquivo, 'wb+') as destination:
        for chunk in f.chunks():
            destination.write(chunk)
Caio Kretzer
  • 159
  • 1
  • 15
  • Thanks for your answer but as I said, The process is killed before i can access to the request object in the "put" method (like a view method in classical django) – Benjamin Oct 08 '20 at 20:47