2

Normally I process files in Python using a with statement, as in this chunk for downloading a resource via HTTP:

with (open(filename), "wb"):
    for chunk in request.iter_content(chunk_size=1024):
        if chunk:
            file.write(chunk)
            file.flush()

But this assumes I know the filename. Suppose I want to use tempfile.mkstemp(). This function returns a handle to an open file and a pathname, so using open in a with statement would be wrong.

I've searched around a bit and found lots of warnings about being careful to use mkstemp properly. Several blog articles nearly shout when they say do NOT throw away the integer returned by mkstemp. There are discussions about the os-level filehandle being different from a Python-level file object. That's fine, but I haven't been able to find the simplest coding pattern that would ensure that

  • mkstemp is called to get a file to be written to
  • after writing, the Python file and its underlying os filehandle are both closed cleanly even in the event of an exception. This is precisely the kind of behavior we can get with an with(open... pattern.

So my question is, is there a nice way in Python to create and write to a mkstemp generated file, perhaps using a different kind of with statemement, or do I have to manually do things like fdopen or close, etc. It seems there should be a clear pattern for this.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
  • 4
    First, is there a reason you need to use `mkstemp` instead of the simpler and higher-level `NamedTemporaryFile`? – abarnert Jan 03 '14 at 21:23
  • 1
    `NamedTemporaryFile` is correct. The warnings about it not behaving exactly the same on different platforms was a little disconcerting at first, and led me to the underlying `mkstemp`, but I'm okay with it now. – Ray Toal Jan 03 '14 at 21:46

1 Answers1

12

The simplest coding pattern for this is try:/finally::

fd, pathname = tempfile.mkstemp()
try:
    dostuff(fd)
finally:
    os.close(fd)

However, if you're doing this more than once, it's trivial to wrap it up in a context manager:

@contextlib.contextmanager
def mkstemping(*args):
    fd, pathname = tempfile.mkstemp(*args)
    try:
        yield fd
    finally:
        os.close(fd)

And then you can just do:

with mkstemping() as fd:
    dostuff(fd)

If you really want to, of course, you can always wrap the fd up in a file object (by passing it to open, or os.fdopen in older versions). But… why go to the extra trouble? If you want an fd, use it as an fd.

And if you don't want an fd, unless you have a good reason that you need mkstemp instead of the simpler and higher-level NamedTemporaryFile, you shouldn't be using the low-level API. Just do this:

with tempfile.NamedTemporaryFile(delete=False) as f:
    dostuff(f)

Besides being simpler to with, this also has the advantage that it's already a Python file object instead of just an OS file descriptor (and, in Python 3.x, it can be a Unicode text file).


An even simpler solution is to avoid the tempfile completely.

Almost all XML parsers have a way to parse a string instead of a file. With cElementTree, it's just a matter of calling fromstring instead of parse. So, instead of this:

req = requests.get(url)
with tempfile.NamedTemporaryFile() as f:
    f.write(req.content)
    f.seek(0)
    tree = ET.parse(f)

… just do this:

req = requests.get(url)
tree = ET.fromstring(req.content)

Of course the first version only needs to hold the XML document and the parsed tree in memory one after the other, while the second needs to hold them both at once, so this may increase your peak memory usage by about 30%. But this is rarely a problem.

If it is a problem, many XML libraries have a way to feed in data as it arrives, and many downloading libraries have a way to stream data bit by bit—and, as you might imagine, this is again true for cElementTree's XMLParser and for requests in a few different ways. For example:

req = requests.get(url, stream=True)
parser = ET.XMLParser()
for chunk in iter(lambda: req.raw.read(8192), ''):
    parser.feed(chunk)
tree = parser.close()

Not quite as simple as just using fromstring… but it's still simpler than using a temporary file, and probably more efficient to boot.

If that use of the two-argument form of iter confuses you (a lot of people seem to have trouble grasping it at first), you can rewrite it as:

req = requests.get(url, stream=True)
parser = ET.XMLParser()
while True:
    chunk = req.raw.read(8192)
    if not chunk:
        break
    parser.feed(chunk)
tree = parser.close()
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • I like the simplicity of `NamedTemporaryFile`. I'm just going to fetch an XML (sorry! not my choice) resource into the tempfile, then parse and validate it, then the tempfile can go away. – Ray Toal Jan 03 '14 at 21:43
  • @RayToal: Do you actually even need a temp file? Most XML parsers can parse a string and/or let you feed data in as it arrives… – abarnert Jan 03 '14 at 22:30
  • Good question. I'm downloading an xml file with the [requests](http://docs.python-requests.org/en/latest/) module and parsing it with [cElementTree](http://effbot.org/zone/celementtree.htm). Yes I would love to be able to have the downloaded HTTP response be treated as a stream which is then read by the cElementTree parser. I tried to figure that out first before giving up and switching to a temp file. – Ray Toal Jan 03 '14 at 22:36
  • 1
    @RayToal: If you really do need to stream, see [Body Content Workflow](http://www.python-requests.org/en/latest/user/advanced/#body-content-workflow) for how to stream data iteratively from requests, and [`XMLParser` Objects](http://docs.python.org/2/library/xml.etree.elementtree.html#xmlparser-objects) for how to feed data iteratively to etree. – abarnert Jan 03 '14 at 22:51
  • @RayToal: See my edited answer for how to do this without using a file at all. – abarnert Jan 03 '14 at 23:01