0

Actually we have a server with Oracle Linux 5.8 in which we present clones of production LUN and then we open a clone of the database for several querys.

The problem begin when we execute this command to find the new cloned LUN

find /sys/class/scsi_host/host*/scan | while read line; do echo - - - > $line; done

Then multipathd doesnt works well and finally we can't start the ASM instance. The only solution is reboot the server. When finally boot, it works well.

We have the same solutions in other server with AIX and when the discover the cloned LUN with cfgmgr it work well.

Any ideas of how to make the process of remove, present and discover the LUN to work well and clean?

Thanks.

Juan
  • 119
  • 1
  • 2
  • 10
  • when i did the same thing on redhat 5.8 it worked and i know oracle linux 5.8 is based on redhat, can you provide more information or log about your problem? – c4f4t0r Feb 28 '14 at 23:49

1 Answers1

0

I've had experiences of multipathd faltering with invalid/stale entries for SCSI devices that are no longer visible to the host. (Does your multipath -ll output say failed faulty for any entries?)

The process for removing a LUN would be (replace values in <...> with actuals):

  1. Remove the visibility of the LUN to the host by editing the HostGroup on the SAN box
  2. Remove the SCSI entry(ies) for that device: echo 1 > /sys/block/<sdx>/device/delete
  3. Remove the multipath entry for that device: multipath -f /dev/mapper/<mpath0>
  4. If that fails (probably due to queued I/O, which is a bad sign in itself), then try forcing the removal:
    • Tell the multipath daemon to fail all I/O to this device instead of queuing: dmsetup message <mpath0> 0 "fail_if_no_path"
    • Wait till the timeout occurs (look for the timeout value in multipath.conf under polling_interval)
    • Force removal of the device using: dmsetup remove <mpath0> --force

I suspect the root cause of the problem is during removal and it is simply manifested as a symptom when scanning for new LUNs.

dotbugfix
  • 101
  • 1