I have two sets of paths, with maybe 5000 files in the first set and 10000 files in the second. The first set is contained in the second set. I need to check if any of the entries in the second set is a child of any entry in the first set (i.e. if it's a subdirectory or file in another directory from the first set). There are some additional requirements:
- No operations on the file system, it should be done only on the path strings (except for dealing with symlinks if needed).
- Platform independent (e.g. upper/lower case, different separators)
- It should be robust with respect to different ways of expressing the same path.
- It should deal with both symlinks and their targets.
- Some paths will be absolute and some relative.
- This should be as fast as possible!
I'm thinking along the lines of getting both os.path.abspath()
and os.path.realpath()
for each entry and then comparing them with os.path.commonpath([parent]) == os.path.commonpath([parent, child])
. I can't come up with a good way of running this fast though. Or is it safe to just compare the strings directly? That would make it much much easier. Thanks!
EDIT: I was a bit unclear about the platform independence. It should work for all platforms, but there won't be for example Windows and Unix style paths mixed.