Iterate over and validate large uploaded CSV files in Django

Question

I'm using the Django module django-chunked-upload to receive potentially large CSV files. I can assume the CSVs are properly formatted, but I can't assume what the delimiter is.

Upon completion of the upload, an UploadedFile object is returned. I need to validate that the correct columns are included in the uploaded CSV and that the data types in each column are correct.

loading the file with csv.reader() doesn't work:

reader = csv.reader(uploaded_file)
next(reader)
>>> _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

This might be because uploaded_file.content_type and uploaded_file.charset are both coming through as None.

I've come up with a fairly inelegant solution to grab the header and iterate over the rows:

i = 0
header = ""
for line in uploaded_file:
    if i == 0:
        header = line.decode('utf-8')
        header_list = list(csv.reader(StringIO(header)))
        print(header_list[0])
        #validate column names
    else:
        tiny_csv = StringIO(header + line.decode('utf-8'))
        reader = csv.DictReader(tiny_csv)
        print(next(reader))
        #validate column types

I also considered trying to load the path of the actual saved file:

path = #figure out the path of the temp file
f = open(path,"r")
reader = csv.reader(f)

But I wasn't able to get the temp file path from the UploadedFile object.

Ideally I would like to create a normal reader or DictReader out of the UploadedFile object, but it seems to be eluding me. Anyone have any ideas? - Thanks

What is the type of you 'uploaded_file', can you print and check it. — ofnowhere, Sep 20 '18 at 18:46
It's `` which is the link provided above. (https://docs.djangoproject.com/en/2.1/_modules/django/core/files/uploadedfile/#UploadedFile) — nbwoodward, Sep 20 '18 at 18:51
I have a clue, can you check once if this solves 'reader = csv.reader(uploaded_file.seek(0))' solves the issue? — ofnowhere, Sep 20 '18 at 18:55
Interesting it spits out a different error: `reader = csv.reader(uploaded_file.seek(0)) TypeError: argument 1 must be an iterator` — nbwoodward, Sep 20 '18 at 19:00
what if you do uploaded_file = uploaded_file.seek(0) and then use it. — ofnowhere, Sep 20 '18 at 19:05
That also throws an error... `.seek()` doesn't return a new object, it mutates the uploaded_file object. So `foo = uploaded_file.seek(0)` then `print(foo)` gives `0`. — nbwoodward, Sep 20 '18 at 19:40
The issue isn't with the position of of the file cursor it's related to the fact the UploadedFile object is a binary file type. It looks like maybe the csv module in python2 handles this differently than python3 https://stackoverflow.com/questions/24662571/python-import-csv-to-list but I'm using python3 (required by django 2.1). — nbwoodward, Sep 20 '18 at 19:45

score 0 · Accepted Answer · answered Sep 25 '18 at 20:46

The answer lies in chunked_upload/models.py which has the line:

def get_uploaded_file(self):
    self.file.close()
    self.file.open(mode='rb')  # mode = read+binary
    return UploadedFile(file=self.file, name=self.filename,
                        size=self.offset)

So when you create your file model you can choose to open the file with mode='r' instead:

#myapp/models.py

from django.db import models
from chunked_upload.models import ChunkedUpload
from django.core.files.uploadedfile import UploadedFile
class FileUpload(ChunkedUpload):
    def get_uploaded_file(self):
        self.file.close()
        self.file.open(mode='r')  # mode = read+binary
        return UploadedFile(file=self.file, name=self.filename,
                            size=self.offset)

This allows you to take the returned UploadedFile instance and parse it as a csv:

def on_completion(self, uploaded_file, request):
    reader = csv.reader(uploaded_file)
    ...

Iterate over and validate large uploaded CSV files in Django

1 Answers1