19

Historically I have always used the following for reading files in python:

with open("file", "r") as f:
    for line in f:
        # do thing to line

Is this still the recommend approach? Are there any drawbacks to using the following:

from pathlib import Path

path = Path("file")
for line in path.open():
    # do thing to line

Most of the references I found are using the with keyword for opening files for the convenience of not having to explicitly close the file. Is this applicable for the iterator approach here?

with open() docs

Dinko Pehar
  • 5,454
  • 4
  • 23
  • 57
Ben Carley
  • 201
  • 1
  • 2
  • 7
  • 4
    Well, yes. You’re still missing the file-closing part in the second example. Not sure why there’s a `Path` difference – `with path.open() as f:` would be the equivalent for the first example, or `for line in open("file", "r"):` for the second. – Ry- Apr 24 '20 at 09:57
  • You should explicitly call `path.close()` in the second example – Rajarishi Devarajan Apr 24 '20 at 09:59
  • 3
    The [`Path.open()`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.open) docs itself uses an example that still uses `with`. – Gino Mempin Apr 24 '20 at 10:13
  • 1
    `3 - 0` - I'll keep using `with ...` @GinoMempin - Completely missed that in the docs, thank you. @RishiDev - there is no `path.close()` method as path.open is returning a file object @Ry- - Is the file not closed by `python` when it is cleared up at the end of the loop? There is not any variable left to call `close()` on? – Ben Carley Apr 24 '20 at 13:00
  • It’s not guaranteed by the language. CPython’s refcounting *will* close the file right after the loop, but it’s best to be interoperable with other Python implementations and continue using `with` for that predictable, deterministic close. (Also, I don’t know if a reference hangs around for a bit on CPython if an exception is thrown inside the loop.) – Ry- Apr 25 '20 at 02:26

3 Answers3

22

Something that wasn't mentioned yet: if all you wanted to do was read or write some text (or bytes) then you no longer need to use the context manager explicitly when using pathlib:

>>> import pathlib
>>> path = pathlib.Path("/tmp/example.txt")
>>> path.write_text("hello world")
11
>>> path.read_text()
'hello world'
>>> path.read_bytes()
b'hello world'

Opening a file to iterate lines should still use a with-statement, for all the same reasons as using the context manager with open, as the docs show:

>>> with path.open() as f:
...     for line in f:
...         print(line)
...
hello world
wim
  • 338,267
  • 99
  • 616
  • 750
  • 2
    Excellent point, the code for these functions already uses `with self.open` behind the scenes :-) – Sofie VL Aug 10 '21 at 12:25
7

Keep in mind that a Path object is for working with filesystem paths. Just like the built-in library of Python, there is an open method but no close in a Path object.

The .close is in the file handle that is returned by either the built-in open or by using the Path object's open method:

>>> from pathlib import Path
>>> p=Path(some_file)
>>> p
PosixPath('/tmp/file')

You can open that Path object either with the built-in open function or the open method in the Path object:

>>> fh=open(p)    # open built-in function
>>> fh
<_io.TextIOWrapper name='/tmp/file' mode='r' encoding='UTF-8'>
>>> fh.close()

>>> fh=p.open()   # Path open method which aliases to os.open
>>> fh
<_io.TextIOWrapper name='/tmp/file' mode='r' encoding='UTF-8'>
>>> fh.close()

You can have a look at the source code for pathlib on Github as an indication of how the authors of pathlib do it in their own code.

What I observe is one of three things.

The most common by far is to use with:

from pathlib import Path 

p=Path('/tmp/file')

#create a file
with p.open(mode='w') as fi:
    fi.write(f'Insides of: {str(p)}')

# read it back and test open or closed
with p.open(mode='r') as fi:
    print(f'{fi.read()} closed?:{fi.closed}')

# prints 'Insides of: /tmp/file closed?:False'

As you likely know, at the end of the with block the __exit__ methods are called. For a file, that means the file is closed. This is the most common approach in the pathlib source code.

Second, you can also see in the source that a pathlib object maintains an entry and exit status and a flag of the file being open and closed. The os.close functions is not explicitly called however. You can check that status with the .closed accessor.

fh=p.open()
print(f'{fh.read()} closed?:{fh.closed}')
# prints Insides of: /tmp/file closed?:False    
# fi will only be closed when fi goes out of scope...
# or you could (and should) do fh.close()


with p.open() as fi:
    pass
print(f'closed?:{fi.closed}')   
# fi still in scope but implicitly closed at the end of the with bloc
# prints closed?:True

Third, on cPython, files are closed when the file handle goes out of scope. This is not portable or considered 'good practice' to rely on, but commonly it is. There are instances of this in the pathlib source code.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • 'sup @dawg - thank you, this is a really comprehensive answer. I think me throwing references in to `pathlib` there was a bit of a red herring but the answer seems to be that I would be relying on an implementation detail rather than a language feature. – Ben Carley Apr 26 '20 at 19:11
1

Pathlib is object oriented way for manipulating filesystem paths.

Recommended way of opening a file using pathlib module would be using context manager:

p = Path("my_file.txt")

with p.open() as f:
    f.readline()

This ensures closing the file after it's usage.


In both examples you provided, you are not closing a files because you open them inplace.

Since p.open() returns file object, you can test this by assigning it and checking attribute closed like so:

from pathlib import Path

path = Path("file.txt")

# Open the file pointed by this path and return a file object, as
# the built-in open() function does.
f = path.open()
for line in f:
    # do some stuff

print(f.closed)  # Evaluates to False.

Dinko Pehar
  • 5,454
  • 4
  • 23
  • 57