volume group disappeared after xfs_check run

Question

EDIT** I have a volume group consisting of 5 RAID1 devices grouped together into a lvm and formatted with xfs. The 5th RAID device lost its RAID config (cat /proc/mdstat does not show anything). The two drives are still present (sdj and sdk), but they have no partitions. The LVM appeared to be happily using sdj up until recently. (doing a pvscan showed the first 4 RAID1 devices + /dev/sdj) I removed the LVM from the fstab, rebooted, then ran xfs_check on the LV. It ran for about half an hour, then stopped with an error.

I tried rebooting again, and this time when it came up, the logical volume was no longer there. It is now looking for /dev/md5, which is gone (though it had been using /dev/sdj earlier). /dev/sdj was having read errors, but after replacing the SATA cable, those went away, so the drive appears to be fine for now.

Can I modify the /etc/lvm/backup/dedvol, change the device to /dev/sdj and do a vgcfgrestore? I could try doing a pvcreate --uuid KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ /dev/sdj to make it recognize it, but I'm afraid that would erase the data on the drive

UPDATE: just changing the pv to point to /dev/sdj did not work

vgcfgrestore --file /etc/lvm/backup/dedvol dedvol
  Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'.
  Cannot restore Volume Group dedvol with 1 PVs marked as missing.
  Restore failed.

pvscan
  /dev/sdj: read failed after 0 of 4096 at 0: Input/output error
  Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'.
  Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'.
  Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'.
  Couldn't find device with uuid 'KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ'.
  PV /dev/sdd2        VG VolGroup00   lvm2 [74.41 GB / 0    free]
  PV /dev/md2         VG dedvol       lvm2 [931.51 GB / 0    free]
  PV /dev/md3         VG dedvol       lvm2 [931.51 GB / 0    free]
  PV /dev/md0         VG dedvol       lvm2 [931.51 GB / 0    free]
  PV /dev/md4         VG dedvol       lvm2 [931.51 GB / 0    free]
  PV unknown device   VG dedvol       lvm2 [1.82 TB / 63.05 GB free]
  Total: 6 [5.53 TB] / in use: 6 [5.53 TB] / in no VG: 0 [0   ]

vgscan
  Reading all physical volumes.  This may take a while...
  /dev/sdj: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdj: read failed after 0 of 4096 at 2000398843904: Input/output error
  Found volume group "VolGroup00" using metadata type lvm2
  Found volume group "dedvol" using metadata type lvm2

vgdisplay dedvol
  --- Volume group ---
  VG Name               dedvol
  System ID             
  Format                lvm2
  Metadata Areas        5
  Metadata Sequence No  10
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                5
  Act PV                5
  VG Size               5.46 TB
  PE Size               4.00 MB
  Total PE              1430796
  Alloc PE / Size       1414656 / 5.40 TB
  Free  PE / Size       16140 / 63.05 GB
  VG UUID               o1U6Ll-5WH8-Pv7Z-Rtc4-1qYp-oiWA-cPD246

dedvol {
        id = "o1U6Ll-5WH8-Pv7Z-Rtc4-1qYp-oiWA-cPD246"
        seqno = 10
        status = ["RESIZEABLE", "READ", "WRITE"]
        flags = []
        extent_size = 8192              # 4 Megabytes
        max_lv = 0
        max_pv = 0

        physical_volumes {

                pv0 {
                        id = "Msiee7-Zovu-VSJ3-Y2hR-uBVd-6PaT-Ho9v95"
                        device = "/dev/md2"     # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 1953519872   # 931.511 Gigabytes
                        pe_start = 384
                        pe_count = 238466       # 931.508 Gigabytes
                }

                pv1 {
                        id = "ZittCN-0x6L-cOsW-v1v4-atVN-fEWF-e3lqUe"
                        device = "/dev/md3"     # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 1953519872   # 931.511 Gigabytes
                        pe_start = 384
                        pe_count = 238466       # 931.508 Gigabytes
                }

                pv2 {
                        id = "NRNo0w-kgGr-dUxA-mWnl-bU5v-Wld0-XeKVLD"
                        device = "/dev/md0"     # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 1953519872   # 931.511 Gigabytes
                        pe_start = 384
                        pe_count = 238466       # 931.508 Gigabytes
                }

                pv3 {
                        id = "2EfLFr-JcRe-MusW-mfAs-WCct-u4iV-W0pmG3"
                        device = "/dev/md4"     # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 1953519872   # 931.511 Gigabytes
                        pe_start = 384
                        pe_count = 238466       # 931.508 Gigabytes
                }

                pv4 {
                        id = "KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ"
                        device = "/dev/md5"     # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 3907028992   # 1.81935 Terabytes
                        pe_start = 384
                        pe_count = 476932       # 1.81935 Terabytes
                }
        }

§ 4 RAID1 disks?! =8-O § Volumegroup on RAID + non-raid one disk?! =8-O § xfs_check on content with non-recovered bad hard disk?! =8-O — I guess you don't need the data on that VolGroup anyways — poige, Jul 16 '11 at 03:02
It was setup with 5 RAID1 arrays. The Volume group still says the 5th device is /dev/md5, however somehow the two disks on that mirrored array lost their "Linux RAID partition". I'm not sure how that happened, but pvscan started showing the 5th device as a raw disk. It was working, but apparently the xfs_check did something — John P, Jul 16 '11 at 03:27

score 0 · Answer 1 · answered Jul 16 '11 at 03:55

0

Wow, your system is badly hosed. With enough care and attention, you could probably reconstruct the LVs in the volume group out of the LVM state archives in /etc/lvm/archives, but it'll be a lot quicker just to break out the backups (you do have backups, right?) and rebuild the system (this time with a more robust RAID setup -- if you've got 10 disks, why not just one big RAID-10?).

To put your mind at ease, I have significant doubts that if you did run xfs_check on an LV, that it could have done anything to corrupt the volume group. Far more likely is that was already hosed, and you just hadn't noticed yet.

answered Jul 16 '11 at 03:55

womble

96,255
29
175
230

It might still be recoverable :-) Here is what I see: The LVM was setup with 5 RAID1 devices (md0-md4). Looking at the /etc/lvm/backup/dedvol config, I see all 5 of those listed. Somehow md4 disappeared. I don't see any trace of it, but the two drives that were in it are still there, just with no partitions. When I examine the first 512 blocks of the drive, I see the LVM config listed in it. I think I could bring up the LVM with just that drive, I could get it up well enough to recover. – John P Jul 16 '11 at 05:56
Put additional information in your question, not a comment. – womble Jul 16 '11 at 05:59
the problem is, I don't know if I should try editing the backup file, pointing it at /dev/sdj and doing a vgcfgrestore, OR doing a pvcreate --uuid KZron2-pPTr-ZYeQ-PKXX-4Woq-6aNc-AG4rRJ /dev/sdj – John P Jul 16 '11 at 05:59
sorry - details added to main posting – John P Jul 16 '11 at 06:22

volume group disappeared after xfs_check run

1 Answers1