2

Here is setup:

  • 70TB JBOD as part of a SAN, split into 5 LUNs (4x15TB and 1x10TB).
  • File server running Centos 7 connects to the 5 LUNs via iSCSI, using iscsiadm, across two netowrk paths, for a total of 10 paths.
  • Use dm-multipath to aggregate these iSCSI paths into single devices ie, /dev/mapper/mpath*.
  • Build physical volumes using pvcreate on the above dm-mutipath devices.
  • Build a single volume group using all physical volumes.
  • Build multiple logical volumes to desired capacity.
  • Format logical volumes as XFS using mkfs.xfs.

Now, there was an issue with the SAN which necessitated maintenance (upgrade of controller firmware) and thus I rebooted the file server to ensure that nothing got screwy when the SAN came back online.

Upon reboot, I was able to reconnent to the SAN and mount the filesystems. They are operating normally.

However, after reboot of the file server, I noticed that the LVM information for these file systems do not appear i.e. pvdisplay, vgdisplay, and lvdisplay only report the local disk on the file server.

These VG and LV do appear in /dev:

/dev/vg_${VG}
/dev/vg_${VG}/${LV1}_lv
/dev/vg_${VG}/${LV2}_lv
/dev/disk/by-id/dm-name-${VG}-${LV1}_lv
/dev/disk/by-id/dm-name-${VG}-${LV2}_lv
/dev/mapper/vg_${VG}-${LV1}_lv
/dev/mapper/vg_${VG}-${LV1}_lv

And they do appear using pvs -a but have zero extents:

[root@file-server /]# pvs -a
PV                                     VG     Fmt  Attr PSize   PFree
/dev/centos/home                                   ---       0     0
/dev/centos/root                                   ---       0     0
/dev/centos/swap                                   ---       0     0
/dev/mapper/mpatha                                 ---       0     0
/dev/mapper/mpathb                                 ---       0     0
/dev/mapper/mpathc                                 ---       0     0
...
/dev/sda                                           ---       0     0
/dev/sda1                                          ---       0     0
/dev/sda2                              centos lvm2 a--  273.80g 4.00m
/dev/sdb                                           ---       0     0
/dev/sdc                                           ---       0     0
/dev/sdd                                           ---       0     0
/dev/sde                                           ---       0     0
...
/dev/vg_${VG1}/${LV1}_lv               ---       0     0
/dev/vg_${VG1}/${LV2}_lv                  ---       0     0

They also appear with dmsetup info -c:

[root@file-server /]# dmsetup info -c
Name                              Maj Min Stat Open Targ Event  UUID
mpathe                            253   6 L--w    1    1      1 mpath-27f3164e4727f3bc5
mpathd                            253   5 L--w    1    1      1 mpath-2b3c12e7d9acc5f25
mpathc                            253   4 L--w    1    1      1 mpath-232eb560378e8ec53
mpathb                            253   7 L--w    1    1      1 mpath-218029135ad1e514a
mpatha                            253   3 L--w    1    1      1 mpath-20123b6d74acce549
vg_${VG}-${LV1}_lv    253  16 L--w    1    1      0 LVM-6DoB20ypbwcGOoRHiX0t8wKAY3oC9BXtSGzQ1wy8fGa9okuQm1NxtPCHnmt0dtO6
vg_${VG}-${LV2}_lv   253  17 L--w    1    3      0 LVM-6DoB20ypbwcGOoRHiX0t8wKAY3oC9BXtmgFlfK9Bilo3IAWxjqwR7dUA8Oq0Fu70
mpathj                            253  15 L--w    1    1      1 mpath-266772bd8af26c781
centos-home                       253   2 L--w    1    1      0 LVM-GAWmujV5zkPn9byt74PY7byRJUWi8UmYSqsQjkt2uTDQ1q5Do38GXYynZhTiLhYw
mpathi                            253  14 L--w    1    1      1 mpath-254a27729bfbfc8c6
mpathh                            253  13 L--w    1    1      1 mpath-2a0ff1a2db7f22f00
mpathg                            253  12 L--w    1    1      1 mpath-27a5ce08413f48f13
mpathf                            253  11 L--w    1    1      1 mpath-2d19e7002c7a41667
centos-swap                       253   1 L--w    2    1      0 LVM-GAWmujV5zkPn9byt74PY7byRJUWi8UmYtA03QjyV1IlWWk9Nz9cHJFKN16SJZ0T5
centos-root                       253   0 L--w    1    1      0 LVM-GAWmujV5zkPn9byt74PY7byRJUWi8UmYCMmaP0envGMf3gk8JhcyoQIQPGmjrL6w

How to I re-instate the LVM meta data? It is simply a matter of vgcfgrestore as outlined here:

https://www.centos.org/docs/5/html/5.2/Cluster_Logical_Volume_Manager/mdatarecover.html

I appear to have a backup in /etc/lvm/backup.

I hesitate to fiddle with this in case I lose any data on the file systems. Recovery is possible, but not without system downtime and delay.

EDIT: Output from pvs, lvs, and vgs below:

[root@dfile-server ~]# pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sda2  centos lvm2 a--  273.80g 4.00m
[root@file-server ~]# lvs
  LV   VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home centos -wi-ao----  46.57g
  root centos -wi-ao---- 221.64g
  swap centos -wi-ao----   5.59g
[root@file-server ~]# vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  centos   1   3   0 wz--n- 273.80g 4.00m
Vince
  • 371
  • 5
  • 17
  • 1
    Can you show the output of `lsblk`, `blkid /dev/mapper/mpatha [b] [c] ...` and `mount` ? – shodanshok Aug 23 '16 at 16:46
  • @shodanshok: here are links: http://pastebin.com/DJgXRgks, http://pastebin.com/7FgMZn2v, and http://pastebin.com/pQQGU5mp. I will update post with truncated versions if solution works out. – Vince Aug 23 '16 at 17:08
  • @shodanshok: I started looking at the `blkid /dev/sd*` as well and they match what I see in `/etc/lvm/backup/`. – Vince Aug 23 '16 at 17:10
  • 1
    Ok, the block devices seem ok. Can you add the output of `pvscan -vvv` and `pvs -vvv` ? – shodanshok Aug 23 '16 at 19:22
  • Thanks. Here you go. pvscan: http://pastebin.com/NZ3SyYMv. pvs: http://pastebin.com/N9Cr5Xbf. – Vince Aug 23 '16 at 19:44
  • 2
    It seems a `lvmetad` problem, and I suspect it has nothing to do with the SAN upgrade, rather with the reboot. Try to stop `lvmetad` issuing the commands `systemctl stop lvmetad.service; systemctl stop lvmetad.socket` and retry with `pvscan; pvs` (and paste output). If this is not sufficient, can you update the machine (`yum update`) and reboot? – shodanshok Aug 23 '16 at 21:22
  • after `systemctl stop` commands: `pvscan`: http://pastebin.com/H2dtxWLc, `pvs`: http://pastebin.com/fnwPmhHX. Also `(pv|vg|lv)display` show the correct output now. Thank you! If I count upvote you 1000x I would :) Please feel free to post an official answer and I will promptly accept it. – Vince Aug 24 '16 at 00:08
  • 1
    Before to write the answer, we should re-enable `lvmetad`. Please issue `pvscan --cache` and re-enable `lvmetad` with `systemctl start lvmetad.service; system to start lvmetad.socket; pvscan --cache`. Finally run `pvs` and check if the volumes continue to be recognized. – shodanshok Aug 24 '16 at 06:46
  • Seems to work: http://pastebin.com/S7Cqx9p8. – Vince Aug 24 '16 at 12:26
  • Ok, I'll post the answer then. – shodanshok Aug 24 '16 at 12:50

1 Answers1

3

As diagnosed in the comments, the problem had nothing to do with the SAN upgrade, rather with lvmetad daemon returning stale/wrong information.

This was confirmed by stopping lvmetad (systemctl stop lvmetad.service; systemctl stop lvmetad.socket) and by issuing pvs which, directly analyzing the block devices, return correct information.

The permanent fix was to update lvmetad cache, by informing it that something changed. This was accomplished running pvscan --cache, re-enabling lvmetad (systemctl start lvmetad.service; systemctl start lvmetad.socket) and finally running another pvscan --cache.

After that, normal pvs (with lvmetad active) returned correct data.

Mircea Vutcovici
  • 17,619
  • 4
  • 56
  • 83
shodanshok
  • 47,711
  • 7
  • 111
  • 180