16

If I'm given a path as a string, such as "~/pythoncode/*.py" what is the best way to glob it in pathlib?

Using pathlib, there is a way of appending to a path using a glob:

p = pathlib.Path('~/pythoncode/').expanduser().glob('*.py')

but this, for example, does not work because the user isn't expanded:

p = pathlib.Path().glob('~/pythoncode/*.py')

and this is generates an exception because I'm providing no arguments to glob():

p = pathlib.Path('~/pythoncode/*.py').expanduser().glob()

Is there a way to do this in pathlib, or must I parse the string first?

Omegaman
  • 2,189
  • 2
  • 18
  • 30
  • I think your question answers itself, but I could be wrong – Mad Physicist Jun 29 '18 at 19:31
  • Look into `PurePath.parts` – Mad Physicist Jun 29 '18 at 19:32
  • I believe `os.path.expanduser()` is a no-op passthrough if there is nothing to expand, so you might be able to do something like `Path().glob(os.path.expanduser('~/pythoncode/*.py'))` – jedwards Jun 29 '18 at 19:33
  • @jedwards: `NotImplementedError: Non-relative patterns are unsupported` – Mad Physicist Jun 29 '18 at 19:37
  • @MadPhysicist: I think it only answers itself if I'm not overlooking something-- which seemed likely with my lack of experience with this library. `pathlib` is quite complete, I was hoping that with the right call order it had a way of more closely mimicking the shell and doing a full expansion. – Omegaman Jun 29 '18 at 23:26

3 Answers3

15

If you're starting from the string "~/pythoncode/*.py" and you'd like to expand and glob, you will need to split the path first. Luckily pathlib provides .name and .parent to help you out:

def expandpath(path_pattern) -> Iterable[Path]:
    p = Path(path_pattern)
    return Path(p.parent).expanduser().glob(p.name)

expandpath("~/pythonpath/*.py")

Note this simple solution will only work when only the name includes a glob, it will not work with globs in other parts of the path, like: ~/python*/*.py. A more general solution that is a bit more complex:

def expandpath(path_pattern) -> Iterable[Path]:
    p = Path(path_pattern).expanduser()
    parts = p.parts[p.is_absolute():]
    return Path(p.root).glob(str(Path(*parts)))

expandpath("~/python*/*.py")

note-2: the above function fails (IndexError: tuple index out of range) with these degenerate paths: '', '.', '/'

ankostis
  • 8,579
  • 3
  • 47
  • 61
Matthew Story
  • 3,573
  • 15
  • 26
  • 1
    This seems to work as long as the wildcards are in the `name` portion of the path as in my example. In the general case, would it be better using `expanduser()` and then iterating over `parts()` as suggested by @MadPhysicist, or `parents()` then doing a merge? – Omegaman Jun 29 '18 at 23:21
  • 1
    @Omegaman good call. Thankfully iterating and merging is not necessary. I've appended a slightly more complicated version to my original solution that solves this generally for relative and absolute paths with globs at many levels. – Matthew Story Jun 30 '18 at 00:09
  • 1
    The 2nd solution is good, it's just funny that this isn't implemented in ``pathlib`` to start with, when I would expect it to be a very common use case of globs... does anyone care to open a feature request? – smheidrich Jul 06 '18 at 11:18
  • Booooo pathlib! – Alex Kreimer Dec 22 '19 at 12:18
8

pathlib.Path.glob does not support absolute (non-relative) path patterns, but glob.glob does:

from glob import glob
from pathlib import Path

paths = [Path(p) for p in glob('/foo/*/bar')]

Or in connection with Path.expanduser:

paths = [Path(p) for p in glob(str(Path('~/.bash*').expanduser()))]
Messa
  • 24,321
  • 6
  • 68
  • 92
0

I found that I really wanted the inline expansion. It wasn't as easy as I thought it'd be.

Anyhow, here's what I've got. Only trivialy tested, but let me know where it falls down for you and I'll edit it.

def expand_pathglobs(pathparts, basepaths=None):
    # Logic:
    # 0. Argue with a Path(str).parts and optional ['/start','/dirs'].
    # 1. for each basepath, expand out pathparts[0] into "expandedpaths"
    # 2. If there are no more pathparts, expandedpaths is the result.
    # 3. Otherwise, recurse with expandedpaths and the remaining pathparts.
    # eg: expand_pathglobs('/tmp/a*/b*')
    #   --> /tmp/a1/b1
    #   --> /tmp/a2/b2

    if isinstance(pathparts, str) or isinstance(pathparts, Path):
        pathparts = Path(pathparts).parts

    if basepaths == None:
        return expand_pathglobs(pathparts[1:], [Path(pathparts[0])])
    else:
        assert pathparts[0] != '/'

    expandedpaths = []
    for p in basepaths:
        assert isinstance(p, Path)
        globs = p.glob(pathparts[0])
        for g in globs:
            expandedpaths.append(g)

    if len(pathparts) > 1:
        return expand_pathglobs(pathparts[1:], expandedpaths)

    return expandedpaths
Autumn
  • 3,214
  • 1
  • 20
  • 35