1

I borrowed this code for saving a file stream to disk, and it works except for when the file is less than 1kb in size. I get this error:

in stuff_uploaded:
copy(theFile.file.name, './tmp/'+theFile.filename) #saves temporary file to /cp2/tmp/

AttributeError: 'cStringIO.StringO' object has no attribute 'name'

@cherrypy.expose
@cherrypy.tools.noBodyProcess()
def stuff_uploaded(self, theFile=None):
    import cgi
    import tempfile
    # convert the header keys to lower case
    lcHDRS = {key.lower():val for key, val in cherrypy.request.headers.iteritems()}
    class myFieldStorage(cgi.FieldStorage):
        """uses a named temporary file instead of the default non-named file; keeping it visibile (named), allows us to create a
        2nd link after the upload is done, thus avoiding the overhead of making a copy to the destination filename."""
        def make_file(self, binary=None):
            return tempfile.NamedTemporaryFile()
    formFields = myFieldStorage(fp=cherrypy.request.rfile,
                                headers=lcHDRS,
                                environ={'REQUEST_METHOD':'POST'},
                                keep_blank_values=True)
    theFile = formFields['theFile']
    # we now create a 2nd link to the file, using the submitted filename.
    from shutil import copy
    copy(theFile.file.name, './tmp/'+theFile.filename) #saves temporary file 
    msgs = csv_to_survey.match_fieldnames('./tmp/'+theFile.filename)
    return './tmp/'+theFile.filename

So what can I do to ensure that cStringIO.StringO handles small uploaded files?

Marc Maxmeister
  • 4,191
  • 4
  • 40
  • 54

1 Answers1

2

Just open and write the file directly:

with open('./tmp/'+theFile.filename, "w") as f:
    f.write(theFile.file.getvalue())

Or to deal with it regardless of whether the file is on disk or StringIO, use it as a file-like object:

import shutil

with open('./tmp/'+theFile.filename, "w") as f:
    # If the file pointer might not be at the beginning of theFile.file, add:
    # theFile.file.seek(0)
    shutil.copyfileobj(theFile.file, f)
    # While:
    #     f.write(theFile.file.read())
    # would work most of the time, it involves holding the whole contents of the
    # file in memory at once (which you want to avoid; that's why CherryPy
    # uses temp files for larger data). shutil.copyfileobj does block by
    # block copies, which have fixed peak memory usage while still running
    # (almost) as fast

Note: This (and your original solution) is insecure, since uploading the same file twice will overwrite the previous file, and (depending on server filters on names) file names could traverse the file system to overwrite unintended files outside the tmp directory.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • do you mean insecure in the sense that files are not protected from being overwritten, or because it can be exploited some how? If I added a random string to the filename that would solve the problem of overwriting files twice. – Marc Maxmeister Aug 04 '15 at 16:09
  • 1
    @MarcMaxson Using a random, unique file name is indeed a much better idea than using user input, which might even contain slashes! Best not to even consider the user-provided filename, it cannot be trusted – Niklas B. Aug 04 '15 at 16:10
  • what if I only retained ascii letters and spaces from the filename? This is only used by admin anyways and behind a password-login, so not really unsafe, but always good to avoid any potential holes. – Marc Maxmeister Aug 04 '15 at 16:13
  • I was using the file stream approach (cStringIO.StringO) because it would allow me to save 50MB or 100MB files without putting them in memory twice. – Marc Maxmeister Aug 04 '15 at 16:13
  • @MarcMaxson You can read and write the file in chunks of, say, 1MB. Then you will never hold more than 1MB in memory. Yes I guess retaining a whitelist of characters in the filename is safe, but even there mistakes can be made involving the encoding, so be careful. – Niklas B. Aug 04 '15 at 16:15
  • This seems to work in a try... except loop for when the other method fails. Thanks! – Marc Maxmeister Aug 04 '15 at 16:23
  • while solving the problem, your solution doesn't really WHY it fails in the first place with cStringIO.StringO. – Marc Maxmeister Aug 04 '15 at 16:24
  • Because shutil.copy expects a file name from the local file system. The name you get from the form was the name the file was submitted under, but it's not a local file by any means, it's just a fake file in memory. – ShadowRanger Aug 04 '15 at 19:12
  • So why does it ONLY cause an error when the file is smaller than 1,024 bytes? works fine otherwise... – Marc Maxmeister Aug 05 '15 at 20:28
  • Ah, I missed that aspect, sorry. The problem there is that CherryPy only invokes make_file if the file size is >= 1000 bytes; below that threshold, it use in-memory storage to avoid the overhead of file I/O. See "Final Notes" here: http://tools.cherrypy.org/wiki/DirectToDiskFileUpload – ShadowRanger Aug 06 '15 at 04:46