I have a multipath config that was working but now shows a "faulty" path:
[root@nas ~]# multipath -ll
sdd: checker msg is "readsector0 checker reports path is down"
mpath1 (36001f93000a63000019f000200000000) dm-2 XIOTECH,ISE1400
[size=200G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:1 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:1 sdd 8:48 [active][faulty]
At the same time I'm seeing these three lines over and over in /var/log/messages
Feb 5 12:52:57 nas kernel: sd 2:0:0:1: SCSI error: return code = 0x00010000
Feb 5 12:52:57 nas kernel: end_request: I/O error, dev sdd, sector 0
Feb 5 12:52:57 nas kernel: Buffer I/O error on device sdd, logical block 0
And this line shows up fairly often too
Feb 5 12:52:58 nas multipathd: sdd: readsector0 checker reports path is down
One thing I don't understand is why its using the readsector0
checking method when my /etc/multipath.conf
file say to use tur
[root@nas ~]# tail -n15 /etc/multipath.conf
devices {
device {
vendor "XIOTECH "
product "ISE1400 "
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -d /dev/%n"
path_checker tur
prio_callout "none"
path_selector "round-robin 0"
failback immediate
no_path_retry 12
user_friendly_names yes
}
}
Looking at the upstream documentation here this paragraph seems relevant: http://christophe.varoqui.free.fr/usage.html
For each path:
\_ host:channel:id:lun devnode major:minor [path_status][dm_status_if_known]
The dm status (dm_status_if_known) is like the path status
(path_status), but from the kernel's point of view. The dm status has two
states: "failed", which is analogous to "faulty", and "active" which
covers all other path states. Occasionally, the path state and the
dm state of a device will temporarily not agree.
Its been well over 24 hours for me so its not temporary.
So with all that as background my questions are
- how can I determine the root cause here?
- how can I manually/command-line perform whatever check its doing
- why is it ignoring my multipath.conf (did I do it wrong?)
Thanks in advance for any ideas, if there's anything else I can provide for info let me know in a comment and I'll edit it into the post.