0

My question is somewhat related to "How to improve searching with os.walk and fnmatch" but I want to expand a little.

Let's assume we have a file collection on a harddrive, which is about 10-50 TB big. I want to find all files with a specific ending on a regular basis. The collection changes daily as new files are added. In a first run, I want to store the attained information, so that in the following runs only the changed files have to be searched which I understand as some kind of indexing of the file system and hope to greatly speed up every consecutive search.

I prefer working in python, but a hint to a readymade software-solutions as well as open-source projects in other languages is greatly appreciated.

Community
  • 1
  • 1
Dschoni
  • 3,714
  • 6
  • 45
  • 80
  • The most efficient way I can think of to do this is to do a once over index of the file system, then use real-time filesystem notifications like [pyinotify](https://github.com/seb-m/pyinotify) – James Mills May 27 '15 at 09:49
  • That looks pretty promising as soon as the harddrive is accessed via Linux. Is there a similar solution for windows? – Dschoni May 27 '15 at 10:13
  • Sadly no; not that I'm aware of. Although I'm no *Windows* developer; there could be some [``win32``](http://sourceforge.net/projects/pywin32/) API(s) that *might* do something similar; but usually inotify-like API(s) are simulated with threading on Windows :/ – James Mills May 27 '15 at 10:15

0 Answers0