5

I'm using os.walk with followlinks=True, but I hit a place where a symbolic link refers to it's own directory, causing an infinite loop. The culprit in this case is /usr/bin/X11 which list listed as follow :

lrwxrwxrwx 1 root root           1 Apr 24  2015 X11 -> .

Is there any way to avoid following links to either . or .. which I would assume, would cause similar problems? I think I could check this with os.readlink then compare against the current path. Is there any other solution for this?

Eric
  • 19,525
  • 19
  • 84
  • 147

2 Answers2

7

There is no way to avoid storing a set of all the directories visited, if you want to avoid recursion. You do not need to use readlink, however, you can just store inodes. This avoids the problem of path canonicalization altogether.

import os
dirs = set()
for dirpath, dirnames, filenames in os.walk('.', followlinks=True):
    st = os.stat(dirpath)
    scandirs = []
    for dirname in dirnames:
        st = os.stat(os.path.join(dirpath, dirname))
        dirkey = st.st_dev, st.st_ino
        if dirkey not in dirs:
            dirs.add(dirkey)
            scandirs.append(dirname)
    dirnames[:] = scandirs
    print(dirpath)
Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • Ok, it doesn't have to be ugly =) – Eric May 02 '16 at 08:00
  • Isn't this risky if your symlinks cross filesystem boundaries? You could have different files with the same inode on two distinct filesystems, no? – gimboland Aug 28 '17 at 18:26
  • 1
    @gimboland: Look at the code: `dirkey = st.st_dev, st.st_ino`. – Dietrich Epp Aug 28 '17 at 18:46
  • Ah yes, sorry I missed that; nice one. – gimboland Aug 29 '17 at 09:01
  • What if I don't mind the same directory being included a number of times by means of symlinks yet would like to avoid recursion? – Ivan Nov 26 '18 at 23:14
  • @Ivan: Then you don't need anything from this answer, you can just use ordinary `os.walk()` by itself. – Dietrich Epp Nov 27 '18 at 00:56
  • @DietrichEpp os.walk() documentation explicitly mentions it is vulnerable to recursive symlinks with followlinks=True. – Ivan Nov 27 '18 at 01:05
  • @Ivan: The `os.walk` function does not make that easy. You'll need to only check the parent directories, which requires doing something like maintaining a separate stack of inodes (and figuring out how many you need to pop each time through the loop), maintaining a map from paths to inodes (and walking up the tree), or writing your own version of `os.walk`. – Dietrich Epp Nov 27 '18 at 01:38
  • @DietrichEpp I thought of converting full path of every item os.walk() finds to a list of inodes and discarding items that contain themselves in the path to themselves this way but quickly dismissed this idea realizing that this won't prevent os.walk() itself from keeping traversing a recursive path. Now the only idea that I have is giving os.walk up and implementing the whole thing manually with os.listdir(). – Ivan Nov 27 '18 at 03:25
  • @Ivan: Don't dismiss that idea so quickly, `os.walk` won't traverse a recursive path if you remove it. This is what the `dirnames[:] = scandirs` line does in the example above. – Dietrich Epp Nov 27 '18 at 07:03
  • 1
    @DietrichEpp Thanks. I'll try that. BTW why `dirnames[:] = scandirs` and not `dirnames = scandirs`? – Ivan Nov 27 '18 at 11:07
  • 1
    This modifies the existing list. – Dietrich Epp Nov 27 '18 at 14:54
2

To completely avoid the problem of infinite recursion (with links pointing to where ever) you need to store the files and/or directories you already visited.

The people from pynotify module had the same issue and used the described method. The patch is in the link ;)

salomonderossi
  • 2,180
  • 14
  • 20