15

Suppose I'm writing code using pathlib and I want to iter over all the files in the same level of a directory.

I can do this in two ways:

p = pathlib.Path('/some/path')
for f in p.iterdir():
    print(f)
p = pathlib.Path('/some/path')
for f in p.glob('*'):
    print(f)

Is one of the options better in any way?

jpf
  • 1,447
  • 12
  • 22
kaki gadol
  • 1,116
  • 1
  • 14
  • 34
  • 1
    It's easy to find difference if you will look in docs of [`glob()`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob) and [`iterdir()`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.iterdir). Use one which fits your requirements better. – Olvin Roght Jul 20 '20 at 18:16
  • I know that, but using `glob('*')` like this will result the same output as `iterdir()` I was wondering whether one of them is better for this use case – kaki gadol Jul 20 '20 at 18:52
  • 2
    Why put the API to extra work parsing and testing against a filter pattern when you could just... not? – ShadowRanger Jul 20 '20 at 18:53
  • Well, @ShadowRanger that might be the answer I was looking for.. :) – kaki gadol Jul 20 '20 at 18:53
  • 1
    This is a valid question. Generally it's a sign of bad design if an interface is more complex than necessary. If it's about performance, `glob('*')` could be implemented as a special case, or `glob()` without arguments could iterate over all files. – Seppo Enarvi Feb 26 '21 at 11:11

2 Answers2

16

Expansion of my comment: Why put the API to extra work parsing and testing against a filter pattern when you could just... not?

glob is better when you need to make use of the filtering feature and the filter is simple and string-based, as it simplifies the work. Sure, hand-writing simple matches (filtering iterdir via if path.endswith('.txt'): instead of glob('*.txt')) might be more efficient than the regex based pattern matching glob hides, but it's generally not worth the trouble of reinventing the wheel given that disk I/O is orders of magnitude slower.

But if you don't need the filtering functionality at all, don't use it. glob is gaining you nothing in terms of code simplicity or functionality, and hurting performance, so just use iterdir.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
10

In addition to the excellent existing answer, there's at least one difference in behavior:

If the directory doesn't exist, iterdir() raises a FileNotFoundError. glob('*') treats this case like an empty folder, returning an empty iterable.

>>> import pathlib
>>> path = pathlib.Path('/some/path')
>>> list(path.glob('*'))
[]
>>> list(path.iterdir())
Traceback (most recent call last):
  [...]
FileNotFoundError: [Errno 2] No such file or directory: '/some/path'
flornquake
  • 3,156
  • 1
  • 21
  • 32