0

I rebooted my ubuntu 20.10 server today and it suddenly started complaining that it cannot find one of the PVs in the root LV

After some digging around in the shell that it dropped me to I could see that the PV was indeed missing. I couldn't activate the VG without adding the --activationmode partial option

There are some worrying messages that keep printing on the console. Namely ata2 softreset... and ata2: SATA link down.

Here are some images of that session: https://photos.app.goo.gl/r5FBfdY5XaPa5y9h9

I booted into a live ubuntu desktop and continued exploring where i promptly discovered that the PV now does exist and I was able to activate and mount the VG without any issues. I also see the SATA messages in the live instance via dmesg, but they don't keep repeating. The disk in question is an SSD. Here's the rest of the dmesg output about it.

[   50.228406] ata2: softreset failed (1st FIS failed)
[   50.943122] ata2: SATA link down (SStatus 0 SControl 300)
[   56.855151] ata2: SATA link down (SStatus 0 SControl 300)
[   56.855157] ata2.00: link offline, clearing class 1 to NONE
[   56.859920] ata2: limiting SATA link speed to 1.5 Gbps
[   57.731143] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   57.737432] ata2.00: ATA-9: INTEL SSDSC2CT120A3, 300i, max UDMA/133
[   57.737436] ata2.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 32), AA
[   57.747452] ata2.00: configured for UDMA/133
[   57.747606] scsi 1:0:0:0: Direct-Access     ATA      INTEL SSDSC2CT12 300i PQ: 0 ANSI: 5
[   57.752238] sd 1:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[   57.755084] sd 1:0:0:0: [sdc] Write Protect is off
[   57.755127] sd 1:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[   57.755444] sd 1:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   57.755515] sd 1:0:0:0: Attached scsi generic sg2 type 0
[   57.780008]  sdc: sdc1
[   57.780452] sd 1:0:0:0: [sdc] Attached SCSI disk

And here's the output from verbose vgdisplay

root@ubuntu:~# vgdisplay -v
  /dev/sdb: open failed: No medium found
  /dev/sdb: open failed: No medium found
  --- Volume group ---
  VG Name               ubuntu-vg
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               229.52 GiB
  PE Size               4.00 MiB
  Total PE              58758
  Alloc PE / Size       58758 / 229.52 GiB
  Free  PE / Size       0 / 0   
  VG UUID               ddb9uT-0717-jSfz-phaq-N8il-4OFu-TqR3fG
   
  --- Logical volume ---
  LV Path                /dev/ubuntu-vg/ubuntu-lv
  LV Name                ubuntu-lv
  VG Name                ubuntu-vg
  LV UUID                nWtpix-WsV2-dT3v-RWtc-zPl1-6SdL-sSwIOB
  LV Write Access        read/write
  LV Creation host, time ubuntu-server, 2021-01-25 00:43:30 +0000
  LV Status              available
  # open                 0
  LV Size                229.52 GiB
  Current LE             58758
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Physical volumes ---
  PV Name               /dev/nvme0n1p3     
  PV UUID               55KfPo-ep2o-n3FB-stZz-65gO-J1Bz-Y9evX0
  PV Status             allocatable
  Total PE / Free PE    30141 / 0
   
  PV Name               /dev/sdc1     
  PV UUID               fPg1BI-COwe-n4YJ-Wo4F-c6I5-4f96-hk1oEn
  PV Status             allocatable
  Total PE / Free PE    28617 / 0
   

I checked my BIOS for SATA settings, upon seeing some posts related to that message and needing to change the SATA mode, but I couldn't find a section for SATA. BIOS options have got a lot more complex since the last time I've needed to go in that deep though!!

Any pointers plzzzzzzz.

Declan Shanaghy
  • 211
  • 1
  • 5

1 Answers1

1

HDDs die. SSDs die too. If you care about your data and your uptime, implement RAID and continuously monitor its heath.

Use smartctl --all /dev/sdc to see device status. You'll see SSD wear and other attributes, some last errors, and you'll be able to initiate device self test and see its results. It's wise to check that for storage devices and do that moninoring regularly. As far as I can remember, Ubuntu has that configured by default and you are advised to check system logs for SMART monitoring messages.

Nikita Kipriyanov
  • 10,947
  • 2
  • 24
  • 45
  • There doesn't seem to be anything wrong with the disk. Here's the output from smartctl. https://gist.github.com/declanshanaghy/9ececd360d1a6af828fc5fc32850b118 – Declan Shanaghy Feb 13 '21 at 16:02
  • I disagree. A parameter 181 Program_Fail_Cnt_Total has a value 1, some fault occured; also it has a realocated sector, which is strange. Notice also while device is capable of 3.0 Gbit/s, it currently runs on 1.5 Gbit/s. If a motherboard port is capable of delivering 3.0 Gbit/s, this slowdown may indicate a cabling problem. Also, I'd find Intel's datasheet on this SSD and read what all parameters mean. For Intel 540 series SSD parameter 233 Media_Wearout_Indicator means "remaining life" and value of 0 means the disk is completely worn out. – Nikita Kipriyanov Feb 13 '21 at 16:27
  • Thanks for the insight. I have no experience interpreting these reports so was mainly swayed by the line stating `SMART overall-health self-assessment test result: PASSED` – Declan Shanaghy Feb 14 '21 at 06:34
  • Any ideas on why is it that the PV cannot be found when booting on the root volume but it works fine when i boot into a live image and activate it there? – Declan Shanaghy Feb 14 '21 at 06:41
  • I may speculate that single error it counted in the table was a crash exactly during faulty boot. That's *how* it didn't worked. Why? Unknown. This repeats consistently? Very unusual. PV cannot be found, but is the hosting device there? Did you try to read that device (sdc?) directly from the initramfs to check if it works right away (try something like `cat /dev/sdc > /dev/null`)? Also, before applying such extensive load, I'd backup anything valuable from it when it allows (e.g. from live image); for example, free enough space on other device and `pvmove` everything away this SSD. – Nikita Kipriyanov Feb 14 '21 at 14:29