3

I'm trying to stream a file to clients with Python, and I need to add the HTTP header fields in the response, namely Content-Length and Last-Modified. I found that I can access these fields from the file using os.fstat, which returns a stat_result object, giving me st_size and st_mtime that I can use in the response header.

Now this os.fstat takes a file descriptor, which is provided by os.open. This works:

import os

file_name = "file.cab"

fd = os.open(file_name, os.O_RDONLY)
stats = os.fstat(fd)

print("Content-Length", stats.st_size) # Content-Length 27544
print("Last-Modified", stats.st_mtime) # Last-Modified 1650348549.6016183

Now to actually open this file and have a file object (so I can read and stream it), I can use os.fdopen, which takes the file descriptor provided by os.open.

f = os.fdopen(fd)
print(f) # <_io.TextIOWrapper name=3 mode='r' encoding='UTF-8'>

We can see that the return object has encoding set to UTF-8. However, when I try to read the file, it gives an error:

print(f.read())
Traceback (most recent call last):

  File "{redacted}/stream.py", line 10, in <module>
    print(f.read())
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 60: invalid start byte

Now there's this flag called os.O_BINARY, but it's mentioned in the document that

The above constants are only available on Windows.

And sure enough, since I'm running on a Unix machine, if I execute os.open with this flag, it gives an AttributeError:

fd = os.open(file_name, os.O_RDONLY | os.O_BINARY)

Traceback (most recent call last):
  File "{redacted}/stream.py", line 5, in <module>
    fd = os.open(file_name, os.O_RDONLY | os.O_BINARY)
AttributeError: module 'os' has no attribute 'O_BINARY'

So is it possible to open a binary file with os.open and os.fdopen on Unix?

Note that this problem doesn't occur if I just use the built-in open function:

file_name = "file.cab"

f = open(file_name, 'rb')
print(f) # <_io.BufferedReader name='file.cab'>
print(f.read()) # throws up the file in my terminal

But I have to open it with the os module, because I need to provide those HTTP header fields I mentioned.

Edit: As mentioned by tripleee, this is an example of an XY problem. I can get the result I want by using os.stat, which doesn't necessarily take a file descriptor and can be used with just the file path. So I can do something like this:

import os

file_name = "file.cab"

f = open(file_name, 'rb')
stats = os.stat(file_name)

print(f) # <_io.BufferedReader name='file.cab'>
print(stats) # os.stat_result(...)

So at this point, I'm only wondering how, or if, it's possible to do the same with os.open and os.fdopen.

Amir Shabani
  • 3,857
  • 6
  • 30
  • 67
  • 2
    This is interesting as such, but your actual question seems to be an [XY Problem](https://en.wikipedia.org/wiki/XY_problem). There are other and generally better ways to get the modification time and size of a file. – tripleee Apr 19 '22 at 06:37
  • @tripleee Yes, I figured that could be the case, so I also provided the reason that I'm doing this. But I find it odd if it's not possible. – Amir Shabani Apr 19 '22 at 06:40
  • The `os` documentation contains this snippet: *"`open()`, `io.open()`, and `codecs.open()` use the UTF-8 encoding by default. However, they still use the strict error handler by default so that attempting to open a binary file in text mode is likely to raise an exception rather than producing nonsense data."* – tripleee Apr 19 '22 at 06:40
  • @tripleee Note that the snippet quoted is only true if UTF-8 Mode is activated. See [PEP-540](https://peps.python.org/pep-0540/) for details. – Mark Tolonen Apr 19 '22 at 20:09

1 Answers1

0

Just tell os.fdopen() to open in binary mode:

f = os.fdopen(fd, 'rb')

Notice the hint in the os.fdopen documentation ...

This is an alias of the open() built-in function and accepts the same arguments.

... for the args parameter:

'r' open for reading (default)

'b' binary mode

Here's a full program to illustrate the difference:

#!/usr/bin/env python3
import os

filepath = "utf8.txt"

fd = os.open(filepath, os.O_CREAT | os.O_WRONLY )
fo1 = os.fdopen(fd)
fo2 = os.fdopen(fd, 'rb')
print(fo1)
print(fo2)

Result:

<_io.TextIOWrapper name=3 mode='r' encoding='UTF-8'>
<_io.BufferedWriter name=3>

PS: I ran into this problem when trying to save an image using PIL. The Image.save() method also accepts a file object / file descriptor. This one too has to be opened in binary mode.

ChristophK
  • 733
  • 7
  • 20