0

I have a text-file full of filenames. Like:

C:\Folder\Subfolder_01\file_1001.csv
C:\Folder\Subfolder_02\file_3030.xls
...

I want to check whether the files still exists (which is easy) or if the name of the subfolder has changed. The name of some subfolders changed by adding some string in front of it (starting with a 4 digit number e.g C:\Folder\Subfolder_02\file_3030.xls has changed to C:\Folder\2019 - Subfolder_02\file_3030.xls).

I tried to solve this with pathlib.glob(). It's possible to do this for one specific file 'by hand' like

list(file.parent.parent.glob('* - Subfolder_02\file_3030.xls'))

which returns a list with the new file-name. But i failed to do this in a loop surrounding the glob with parameters.

This is what I got so far, but my attempt to concatenate the glob with other variables (using +) failes for obvious reasons:

import pathlib

file = pathlib.Path(file_names.txt)
lines=[]

with open(file,'r') as f:
    # reading the txt-file line by line         
    for line in f:
        line = line.replace("\r", "").replace("\n", "")
        lines.append(line)

for file in lines:
    file = pathlib.Path(file)
    # check if file exists ...
    if file.exists():
        print('OK - ' + file.name)
    # ... if not, find new location
    else:
        new_files = list(file.parent.parent.glob('* - ') + file.name)
        print(files_files)  
Cohan
  • 4,384
  • 2
  • 22
  • 40
rhombuzz
  • 97
  • 10

2 Answers2

1

I would set your top directory as a path and use that to glob the files under the directory if you can't find the file in its original location. Using ** in the glob will search all folders.

# Set top level directory as desired.
parent_dir = Path('.')

# you can use splitlines() to parse the file into a list
with Path('file_names.txt').open() as f:
    files = f.read().splitlines()

for f in files:
    orig = Path(f)

    # Still in location, no need to look further
    if orig.exists():
        print(f"{orig.absolute()} is still in place.")
        continue

    # See if we can find it under parent_dir
    matches = [*parent_dir.glob(f"**/{orig.name}")]

    if len(matches) > 1:
        print("Multiple Matches Found")

    for match in matches:
        print(f"{orig.absolute()} might be in {match.absolute()}")
Cohan
  • 4,384
  • 2
  • 22
  • 40
  • Thanks for the work! The idea to use f-strings works like a charm. For my purposes I modified it a little bit to `matches = [*file.parent.parent.glob(f"* {file.parts[-2]}/{file.name}")]`. – rhombuzz Jan 17 '20 at 05:43
0

Try watchdog

For example:

import os
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

RESOURCES_PATH = "C:\Folder"

class dirs_watcher(FileSystemEventHandler):

    def __init__(self):
        self.observe()
        self.cur_dirs = os.listdir(RESOURCES_PATH)

    def observe(self):
        self.observer = Observer()
        self.my_watch = self.observer.schedule(self, path=RESOURCES_PATH, recursive=True)
        self.observer.start()

    def on_modified(self, event=None):
        # A folder was modified:
        self.new_dirs = os.listdir(RESOURCES_PATH)
        old = set(self.cur_dirs) - set(self.new_dirs)
        new = set(self.new_dirs) - set(self.cur_dirs)
        print("{} changed to {}".format(old, new))

        self.cur_dirs = self.new_dirs # update cur_dirs


on_modified will be triggered when a sub directory changes and you can extract the changed folders names by keeping a sub directories list

Adam Rosenthal
  • 458
  • 5
  • 9