3

Initially I tried

folder_path.glob('[0-9]_*.json')

Where folder_path is a pathlib.Path object. But it only works for files that start with a single digit.

after failing to find a proper match pattern, I used an additional condition to verify that what precedes the underscore is a numeric string

[ file_path for file_path in folder_path.glob('*_*.json') if file_path.name.split('_')[0].isnumeric() ]

but that seems like a workaround that only applies to this case. Is there a better way to match numbers of any length using glob?

Cristian
  • 405
  • 3
  • 10
  • Unfortunately glob is not as flexible as regex. A workaround might be to write folder_path.glob('[0-9]*_*.json') - so it matches any file that has a numeric first character – teambob Dec 15 '22 at 22:48

2 Answers2

1

Use a regular expression to match the path:

import re

res = [file_path for file_path in folder_path.glob('[0-9]*_*.json') if re.match(r"[0-9]+_.*\.json", str(file_path))]
print(res)

Output (example)

[PosixPath('123_abc.json')]

Python's glob modules follows the rules used by the Unix shell, from the documentation:

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

The rules for the Unix shell can be found here these rules don't include variable pattern length, as in your case.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • Unfortunately this doesn't seem to be supported in python's glob() function. It is more limited than bash. Here is the code for it: https://github.com/python/cpython/blob/main/Lib/glob.py – teambob Dec 15 '22 at 22:50
1

Your solution is fine. When glob patterns don't work, write your own. Glob patterns were invented to make finding files in the shell easier but there is a trade-off between expressiveness and usability. pathlib converts your glob to a regular expression and you could too. pathlib uses the underlying os.listdir and os.scandir utilities, but you could stick with Path.iterdir

import re

my_glob = re.compile(r"\d+_").match
[file_path for file_path in folder_path.iterdir() if my_glob(file_path.name)]
tdelaney
  • 73,364
  • 6
  • 83
  • 116