22

Suppose I have a subdirectory of symlinks that looks like the following:

subdir/
    folder/
        readme.txt
    symlink/ => ../hidden/
hidden/
    readme.txt

If I run the following code:

>>> from pathlib import Path
>>> list(Path('./subdir/').glob('**/readme.txt'))

I would expect the outcome to be:

subdir/folder/readme.txt
subdir/symlink/readme.txt

But the actual result is:

subdir/folder/readme.txt

I found out that this is because (for some undocumented reason) the ** operator doesn't follow symlinks.

Is there a way to change this configuration pragmatically?

bashaus
  • 1,614
  • 1
  • 17
  • 33
  • 1
    That's strange, since there's an [open issue](https://bugs.python.org/issue29475) asking for `glob` to optionally *not* follow symlinks. – Jeremy McGibbon Oct 02 '17 at 16:44
  • @JeremyMcGibbon you can never please everyone, hey? – bashaus Oct 02 '17 at 16:45
  • This behavior seems to have been caused as a side effect of fixing this issue: https://bugs.python.org/issue26012. The method `_iterate_directories()` in class `_RecursiveWildcardSelector` of pathlib.py explicitly ignores symlinks. – Jan Wilamowski Feb 24 '22 at 07:28
  • Probably of interest: https://github.com/python/cpython/issues/77609#issuecomment-1567306837 – Warpig Jun 12 '23 at 12:09

3 Answers3

18

pathlib.glob also doesn't work for me with ** and symlinks. I've found related issue https://bugs.python.org/issue33428.

As an alternative for Python3 you could use glob.glob with ** and recursive=True option (see details https://docs.python.org/3/library/glob.html)

In [67]: from glob import glob
In [71]: list(glob("./**/readme.txt", recursive=True))
Out[71]:
['./hidden/readme.txt',
 './subdir/folder/readme.txt',
 './subdir/symlink/readme.txt']

In [73]: list(glob("./**/readme.txt", recursive=False))
Out[73]: ['./hidden/readme.txt']

Compare to:

In [72]: list(Path('.').glob('**/readme.txt'))
Out[72]: [PosixPath('hidden/readme.txt'), PosixPath('subdir/folder/readme.txt')]
1

I've never used pathlib before, so you may extend this solution to take advantage of some of its features, but I got this to work using glob only.

from glob import glob
list(glob('./subdir/*/readme.txt'))

Output:

['./subdir/folder/readme.txt', './subdir/symlink/readme.txt']

If you're set on using glob with more than one depth of subdirectory, the hackish solution is to include variations with extra */ (e.g. ./subdir/*/*/*/readme.txt) up to some arbitrary depth, and concatenate the results from each variation.

The more appropriate way to do what you want would be to write a custom function that has the behavior you want (searches through symlinks to arbitrary depth), and handles the case of circular paths in the way you want. See this question for tips on doing this with os.walk (remember to set followlinks=True).

Jeremy McGibbon
  • 3,527
  • 14
  • 22
  • 2
    `Path.glob` and `glob` use the same functionality - the issue I have found is that glob doesn't like following symlinks when using `**`. – bashaus Oct 02 '17 at 16:54
  • Is there a reason you can't use `*` instead of `**`, since your directory structure has a specific depth? Interesting to note that [`glob` in javascript](https://www.npmjs.com/package/glob) has the same behavior for `**`. – Jeremy McGibbon Oct 02 '17 at 17:08
  • I am guessing the reason for this behavior is to avoid issues with infinite depth in the case of symlinks that create directory loops. – Jeremy McGibbon Oct 02 '17 at 17:12
  • I'm trying to write a solution abstract enough that it can be used in a few case scenarios. There will be instances where the readme.txt file may be burried deep – bashaus Oct 02 '17 at 17:31
  • Updated solution to reflect your need for deeper paths. – Jeremy McGibbon Oct 02 '17 at 17:37
1

python 3.6 works for me with a call to rglob,

import pathspec

p = pathspec.Path("./subdir").rglob("readme.txt")
jxramos
  • 7,356
  • 6
  • 57
  • 105