I'm looking for a fast way to get a List of Files with certain attributes, in parallel from Disk.
Attributes: file size, absolute file path
Currently i'm using boost filesystem and a recursive call with directory iterators. It's fine for small datasets but for a Million files in say 50.000 Folders its not great.
Usage Environment: OS: FreeBSD, Linux, Windows Filesystems: ZFS, ext4, NTFS
Basic Idea:
- Thread Pool
- SubTreeWalker Object
- Partition root folder among threads
- subtreewalker asks threadpool for each new dir in subdir if there are lazy threads
- if 4 == true, assign directory to subtreewalker object in lazy thread.
What do you think of the basic idea, is it sound? Are there any implications of parallel access to the B+ Tree of the filesystem?