Upload files to s3 using celery tasks

Question

I am trying to upload video file to s3 but after putting in task queue with celery. While the video is being uploaded, user can do other things.

My views.py to call celery tasks

def upload_blob(request, iterator, interview_id, candidate_id, question_id):
    try:
        interview_obj = Interview.objects.get(id=interview_id)
    except ObjectDoesNotExist:
        interview_obj = None
    current_interview = interview_obj
    if request.method == 'POST':
        print("inside POST")
        # newdoc1 = Document(upload=request.FILES['uploaded_video'], name="videos/interview_"+interview_id+"_candidate_"+candidate_id+"_question_"+question_id)
        # newdoc1.save()
        save_document_model.delay(request.FILES['uploaded_video'],"videos/interview_"+interview_id+"_candidate_"+candidate_id+"_question_"+question_id)
        # newdoc2 = Document(upload=request.FILES['uploaded_audio'], name="audios/interview_"+interview_id+"_candidate_"+candidate_id+"_question_"+question_id)
        # newdoc2.save()
        save_document_model.delay(request.FILES['uploaded_audio'],"audios/interview_"+interview_id+"_candidate_"+candidate_id+"_question_"+question_id)
        iterator = str(int(iterator) + 1)

        return HttpResponseRedirect(reverse('candidate:show_question', kwargs={'iterator': iterator,'interview_id':current_interview.id,'question_id':question_id}))
    else:

        return render(request, 'candidate/record_answer.html')

Actual celery tasks.py

@task(name="save_document_model")
def save_document_model(uploaded_file, file_name):

    newdoc = Document(upload=uploaded_file, name=file_name)
    newdoc.save()

    logger.info("document saved successfully")
    return HttpResponse("document saved successfully")

Document Model

def upload_function(instance, filename):
    getname = instance.name
    customlocation = os.path.join(settings.AWS_S3_CUSTOM_DOMAIN, settings.MEDIAFILES_LOCATION, getname)
    # Add other filename logic here
    return getname # Return the end filename where you want it saved.

class Document(models.Model):
    name = models.CharField(max_length=25)
    uploaded_at = models.DateTimeField(auto_now_add=True)
    upload = models.FileField(upload_to=upload_function)

Settings.py

AWS_ACCESS_KEY_ID = '**********************'
AWS_SECRET_ACCESS_KEY = '**************************'
AWS_STORAGE_BUCKET_NAME = '*********'
AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
AWS_S3_OBJECT_PARAMETERS = {
    'CacheControl': 'max-age=86400',
}
AWS_LOCATION = 'static'
AWS_DEFAULT_ACL = None

MEDIAFILES_LOCATION = 'uploads/'
DEFAULT_FILE_STORAGE = 'watsonproj.storage_backends.MediaStorage'

# CELERY STUFF
BROKER_URL = 'redis://localhost:6379'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Africa/Nairobi'
CELERY_IMPORTS=("candidate.tasks")

Direct upload is working without celery, but with celery I am getting this error:

Object of type 'InMemoryUploadedFile' is not JSON serializable

Oluwafemi Sule · Answer 1 · 2018-09-29T11:03:49.137

Celery gives the option to configure how the task payloads are serialized.

The task serializer configured in your project settings sets CELERY_TASK_SERIALIZER = json.

request.FILES['<input>'] is an instance of django.core.files.uploaded.files.InMemoryUploadedFile and can't be encoded directly with the json serializer (List of supported types).
While there are ways to serialize files as binary data, should your users upload large files your application stands the chance of using up large amounts of memory

You can consider using django.core.files.uploadedfile.TemporaryFileUploadHandler in any case and forwarding the temporary file path (request.FILES['<input>'] .temporary_file_path()) instead of request.FILES['<input>'] in the task payload.

To enforce this, configure FILE_UPLOAD_MAX_MEMORY_SIZE = 0 in your project settings. Caveat: This deactivates the MemoryFileUploadHandler for your entire project.

Subsequently in the task definition, you can then read the file into memory to save a new Document.

from django.core.files import File
from django.conf import DEFAULT_FILE_STORAGE as storage

@task(name="save_document_model")
def save_document_model(file_path, file_name):

    with open(file_path, 'r') as f:
        file = File(f)

        newdoc = Document(upload=file, name=file_name)
        newdoc.save()

        logger.info("document saved successfully")

        storage.delete(file_path) # cleanup temp file

    return HttpResponse("document saved successfully")

Is there any other way like saving the file to local server first and then transferring it to s3? In that case, i can just pass the document id to celery task. — Somnath Das, Sep 29 '18 at 11:34
My answers explains saving to the local server first, transferring to S3 then cleaning up the saved file. I'm not clear about your comment. — Oluwafemi Sule, Sep 29 '18 at 12:42

Upload files to s3 using celery tasks

1 Answers1