I'm using the Django module django-chunked-upload to receive potentially large CSV files. I can assume the CSVs are properly formatted, but I can't assume what the delimiter is.
Upon completion of the upload, an UploadedFile object is returned. I need to validate that the correct columns are included in the uploaded CSV and that the data types in each column are correct.
loading the file with csv.reader()
doesn't work:
reader = csv.reader(uploaded_file)
next(reader)
>>> _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
This might be because uploaded_file.content_type
and uploaded_file.charset
are both coming through as None
.
I've come up with a fairly inelegant solution to grab the header and iterate over the rows:
i = 0
header = ""
for line in uploaded_file:
if i == 0:
header = line.decode('utf-8')
header_list = list(csv.reader(StringIO(header)))
print(header_list[0])
#validate column names
else:
tiny_csv = StringIO(header + line.decode('utf-8'))
reader = csv.DictReader(tiny_csv)
print(next(reader))
#validate column types
I also considered trying to load the path of the actual saved file:
path = #figure out the path of the temp file
f = open(path,"r")
reader = csv.reader(f)
But I wasn't able to get the temp file path from the UploadedFile object.
Ideally I would like to create a normal reader or DictReader out of the UploadedFile object, but it seems to be eluding me. Anyone have any ideas? - Thanks