2

On one of my machine, cd /var/lock fails despite that /var/lock is a symlink to an existing directory ../run/lock.

After further investigation, I found that any symlink pointing to another mount point using relative path will fail. This only happens on this particular machine.

For example, assuming I have 3 files /var/foo, /data/foo and /run/foo, with

  • /dev/vda mounted on /
  • /dev/vdb mounted on /data
  • tmpfs mounted on /run

enter image description here

[root@VM-16-197-centos ~]# cat /proc/self/mountinfo
18 40 0:18 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw
19 40 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
20 40 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,size=32890360k,nr_inodes=8222590,mode=755
21 18 0:17 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - securityfs securityfs rw
22 20 0:19 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw
23 20 0:12 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
24 40 0:20 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,mode=755
25 18 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:8 - tmpfs tmpfs ro,mode=755
26 25 0:22 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
27 18 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:20 - pstore pstore rw
28 25 0:24 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,hugetlb
29 25 0:25 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,cpuset
30 25 0:26 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,blkio
31 25 0:27 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,net_prio,net_cls
32 25 0:28 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,freezer
33 25 0:29 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,cpuacct,cpu
34 25 0:30 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,perf_event
35 25 0:31 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,memory
36 25 0:32 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,devices
37 25 0:33 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,pids
38 18 0:34 / /sys/kernel/config rw,relatime shared:21 - configfs configfs rw
40 0 253:1 / / rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered
16 19 0:16 / /proc/sys/fs/binfmt_misc rw,relatime shared:23 - autofs systemd-1 rw,fd=33,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11560
42 20 0:15 / /dev/mqueue rw,relatime shared:24 - mqueue mqueue rw
41 20 0:36 / /dev/hugepages rw,relatime shared:25 - hugetlbfs hugetlbfs rw
43 18 0:6 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
74 18 0:37 / /sys/fs/fuse/connections rw,relatime shared:55 - fusectl fusectl rw
76 24 0:38 / /run/user/0 rw,nosuid,nodev,relatime shared:57 - tmpfs tmpfs rw,size=6580380k,mode=700
78 16 0:39 / /proc/sys/fs/binfmt_misc rw,relatime shared:59 - binfmt_misc binfmt_misc rw
80 40 253:16 / / rw,relatime shared:61 - ext4 /dev/vdb rw,data=ordered
82 80 253:1 / / rw,relatime shared:63 - ext4 /dev/vda1 rw,data=ordered
84 40 253:16 / /data rw,relatime shared:65 - ext4 /dev/vdb rw,data=ordered
[root@VM-16-197-centos ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Thu Mar  7 06:38:37 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=4b499d76-769a-40a0-93dc-4a31a59add28 /                       ext4    defaults        1 1
UUID=6906702c-65dd-4664-adf0-31ed67c92dab /                       ext4    defaults        1 1
[root@VM-16-197-centos ~]# readlink /proc/self/ns/{mnt,user} /proc/1/ns/{mnt,user}
mnt:[4026531840]
user:[4026531837]
mnt:[4026531840]
user:[4026531837]

I suspect it's a bug in the kernel. Kernel version: 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021.

SELinux is disabled.

yyyy
  • 123
  • 9
  • There is no way to tell with the given information. Please provide the output of `ls -l` for every element of a symlink path. – Gerald Schneider Jul 13 '22 at 06:24
  • @GeraldSchneider I added a screenshot of `ls`. Hope it's clear enough – yyyy Jul 13 '22 at 07:38
  • Please add the output of `/proc/self/mounts` – Matthew Ife Jul 13 '22 at 07:42
  • Also can you provide the SELinux contexts these paths are mounted with with `ll -d -Z` and `id -Z` as its possible the `readlink` call is not permitted in the current SELinux context. – Matthew Ife Jul 13 '22 at 07:50
  • @MatthewIfe added `/proc/self/mounts`. SELinux is diabled on this machine. – yyyy Jul 13 '22 at 07:56
  • Odd that you have some of those mounts there twice.. Pulling at straws a little here but what is the output of `readlink /proc/{1,self}/ns/{mnt,user}` (they typically should both be equal for each named file) – Matthew Ife Jul 13 '22 at 08:15
  • @MatthewIfe they are in the same mnt and user ns – yyyy Jul 13 '22 at 08:26
  • Just to be clear - what actually happens if you were to call `cat /data/link2` is it complaining the file doesn't exist? – Matthew Ife Jul 13 '22 at 08:31
  • @MatthewIfe yes. `strace` shows `open("/var/link2", O_RDONLY) = -1 ENOENT` – yyyy Jul 13 '22 at 08:33
  • I wonder if it's because `/` is mounted twice. If so, maybe I can solve this by `umount /` once, but I don't dare. – yyyy Jul 13 '22 at 08:43
  • I think you've got the root path mounted twice and I believe relates to the problem - yet I dont know why (yet). If you replace `cat /proc/self/mounts` with `cat /proc/self/mountinfo` that might show the mount path parental relationship. – Matthew Ife Jul 13 '22 at 08:43
  • @MatthewIfe I see two different devices mounted on `/` in `/etc/fstab`, looks like a mis-configuration. This is not my machine, I will contact it's owner to solve this. – yyyy Jul 13 '22 at 08:50

1 Answers1

4

OK so according to /proc/self/mountinfo you've got some pretty weird mount relationships going on here.

In addition your /etc/fstab has two references to the root partition being mounted by two different UUIDs which looks like its the root cause.

The following mountinfo outputs show a bizarre relationship between mounts.

40 0 253:1 / / rw,relatime shared:1 - ext4 /dev/vda1 rw,data=ordered
80 40 253:16 / / rw,relatime shared:61 - ext4 /dev/vdb rw,data=ordered
82 80 253:1 / / rw,relatime shared:63 - ext4 /dev/vda1 rw,data=ordered
84 40 253:16 / /data rw,relatime shared:65 - ext4 /dev/vdb rw,data=ordered

The first mount (this will most likely be the original mount from boot) is on the first line. It shares no parent (0 in the second field). You subsequently have /dev/vdb mounted on top of the root path, its second column parent being 40, which is the ID of the first root mountpoint, overwriting the VFS you see with the root as /dev/vdb -- this is probably from /etc/fstab and a mistake (one of those lines in fstab relates to a invalid UUID which is the UUID of /dev/vdb).

Following from this, mounted on top of this mount is /dev/vda1 again. You can see the mount ID in the second (80) column references the same ID in the first column from the second line.

The fourth line shows that the mount for /dev/vdb (82) is actually mounted at /data from the top of the path at the original / (40) - this is probably the intended setup. This is why you get the problem in /data when you spy on it in your own setup.

In effect, the root partition you've landed yourself upon is invalid, if you go up a directory relative from this root and down into data again, relative to the root, there is no mounted child for /data in that / path.

You can see /data if you perform a absolute lookup like you are doing with ls -ld as the way relative paths are resolved relies on the mountpoint parent/child relationships whereas absolute lookups do not.

To fix this you're going to need to.

  • Fix the fstab entry by making sure the UUID goes to /data and not / for /dev/vdb.
  • Identify what proceses are using the new root,
  • Stop these processes.
  • Unmount the bad root.

But frankly its probably easier to fix the fstab and then reboot the host to correct the mount state.

Matthew Ife
  • 23,357
  • 3
  • 55
  • 72