0

I want to search recursive for all files in all folders with pathlib, but I want to exclude hidden system files that start with '.' (like '.DS_Store') But I can't find a function like startswith in pathlib. How can I achieve startswith in pathlib? I know how to do it with os.

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if f.startswith(".")])
    print(fcount)
BenjaminK
  • 653
  • 1
  • 9
  • 26

3 Answers3

5

startswith() is a Python string method, see https://python-reference.readthedocs.io/en/latest/docs/str/startswith.html

Since your f is a Path object, you have to convert it into a string first via str(f)

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if str(f).startswith(".")])
    print(fcount)
Peter
  • 10,959
  • 2
  • 30
  • 47
4

there is a kind of startswith - You can use pathlib.Path.is_relative_to() :

pathlib.Path.is_relative_to() was added in Python 3.9, if You want to use it on earlier versions (3.6 upwards), You need to use the backport pathlib3x:

$> python -m pip install pathlib3x
$> python
>>> p = Path('/etc/passwd')
>>> p.is_relative_to('/etc')
True
>>> p.is_relative_to('/usr')
False

you can find pathlib3x on github or PyPi

But this still will not help for Your example, because You want to skip FILES that start with '.' - so Your solution is correct - but not very efficient:

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if not str(f.name).startswith(".")])
    print(fcount)

Imagine You have 2 million files in the scan_path, this would create a list with 2 million pathlib.Path objects. Whow, that will take some time and memory ...

It would be better to have a kind of filter like fnmatch or something for the glob function - I am considering it for pathlib3x.

The Path.glob() returns a generator iterator which needs much less memory.

so in order to save memory, the solution can be :

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = 0
    # we only have one instance of f at the time
    for f in root_directory.glob('**/*'):
        if not str(f.name).startswith(".")]):
            fcount = fcount + 1
    print(count)


Disclaimer: I'm the author of the pathlib3x library.

bitranox
  • 1,664
  • 13
  • 21
0

My solution:

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if not str(f.name).startswith(".")])
    print(fcount)
BenjaminK
  • 653
  • 1
  • 9
  • 26