Can't reclaim lvm "thin pool" space

Question

The pool usage is very large compared to the lvm volume, but it doesn't seem to be actually used.

Previously, the metadata area was full and the metadata was expanded. Since then I've had "lvm transaction id mismatch" issue and I solved it through vgcfgbackup -> change transaction id -> vgcfgrestore.

The unreclaimed lvm thin pool space problem occurred after vgcfgrestore. deleting snapshots, fstrim for mounted lvm volumes didn't solve it either.

Any ideas for solving this problem?

$ lvs -a vg0 -o +discards
LV                 VG      Attr       LSize   Pool       Origin    Data%  Meta%  Move Log Cpy%Sync Convert Discards
  20221101.120002    vg0 Vwi-aotz-k  15.00t tpool0 tvol0 29.13                                               passdown
  20221101.180001    vg0 Vwi-aotz-k  15.00t tpool0 tvol0 29.13                                               passdown
  20221102.000001    vg0 Vwi-aotz-k  15.00t tpool0 tvol0 29.13                                               passdown
  20221102.060001    vg0 Vwi-aotz-k  15.00t tpool0 tvol0 29.13                                               passdown
  20221102.120001    vg0 Vwi-aotz-k  15.00t tpool0 tvol0 29.13                                               passdown
  tpool0             vg0 twi-aotz--  16.00t                          90.86  0.59                             passdown
  [tpool0_tdata]     vg0 Twi-ao----  16.00t                                                                      
  [tpool0_tmeta]     vg0 ewi-ao---- <15.01g                                                                      
  [tpool0_tmeta]     vg0 ewi-ao---- <15.01g                                                                      
  tvol0              vg0 Vwi-aotz--  15.00t tpool0                   29.13                                   passdown
  [lvol0_pmspare]    vg0 ewi------- <15.01g                                                                      
  [lvol0_pmspare]    vg0 ewi------- <15.01g                                                                      
  [lvol0_pmspare]    vg0 ewi------- <15.01g 

$ dmsetup ls | grep vg0 | sort -k2 -V
vg0-tpool0_tmeta    (253:4)
vg0-tpool0_tdata    (253:5)
vg0-tpool0-tpool    (253:6)
vg0-tpool0          (253:7)
vg0-tvol0           (253:8)
vg0-20221102.000001 (253:16)
vg0-20221102.060001 (253:17)
vg0-20221102.120001 (253:18)
vg0-20221101.120002 (253:19)
vg0-20221101.180001 (253:20)

$ grep . /sys/block/dm-{4..8}/queue/discard_max_bytes 
/sys/block/dm-4/queue/discard_max_bytes:0
/sys/block/dm-5/queue/discard_max_bytes:0
/sys/block/dm-6/queue/discard_max_bytes:0
/sys/block/dm-7/queue/discard_max_bytes:0
/sys/block/dm-8/queue/discard_max_bytes:17179869184

PLIP · Answer 1 · 2022-11-24T01:29:57.463

On my linux box this problem is solved. The cause was unknown, but I found a transaction id mismatch between the vgcfg and the thin_dump and solved it by matching transaction id.

I hope this helps someone.

!!Caution!!

If you follow the steps given below, you may overwrite important information in LVM and dm-device. The situation may get worse and may not be resolved. I'm not an expert on these issues. Before applying this solution, fully consider the risks and fully test it on another device, such as a virtual machine. Also consider that your situation may be different than mine.

I take no responsibility.

[thin dump]

$ dmsetup message vg0-tpool0-tpool 0 reserve_metadata_snap
$ thin_dump -m /dev/mapper/vg0-tpool0_tmeta > meta.backup.xml ; grep transaction meta.backup.xml
<superblock uuid="" time="86" transaction="169" flags="0" version="2" data_block_size="16384" nr_data_blocks="0">
  <device dev_id="1" mapped_blocks="574203" transaction="0" creation_time="0" snap_time="86">
  <device dev_id="56" mapped_blocks="1865793" transaction="108" creation_time="55" snap_time="55">
  <device dev_id="77" mapped_blocks="572719" transaction="154" creation_time="80" snap_time="80">
  <device dev_id="80" mapped_blocks="573481" transaction="162" creation_time="83" snap_time="83">
  <device dev_id="81" mapped_blocks="573838" transaction="164" creation_time="84" snap_time="84">
  <device dev_id="82" mapped_blocks="573845" transaction="166" creation_time="85" snap_time="85">
  <device dev_id="83" mapped_blocks="574074" transaction="168" creation_time="86" snap_time="86">
$ dmsetup message vg0-tpool0-tpool 0 release_metadata_snap

[vg config] : not found transaction_id 108

$ vgcfgbackup vg0 -f vg0.backup.cfg ; grep transaction vg0.backup.cfg
                transaction_id = 169
                transaction_id = 0
                transaction_id = 154
                transaction_id = 162
                transaction_id = 164
                transaction_id = 166
                transaction_id = 168

[repair]

Find vgcfg backup included "transaction id 108" in /etc/lvm/archive.

$ grep "transaction.*108" /etc/lvm/archive/* | cut -d: -f1
$ vi ${vgcfg_filename}
  ### block of vgcfg with transaction_id = 108 ###
  #### Copy this block and paste it to "vg0.backup.cfg" created in the previous process.
        20221101.000005 {
        ...
        transaction_id = 108
        ...
        }
        }

Modify vg0.backup.cfg

$ vi vg0.backup.cfg
vg0 {    
        ...
        tvol0 {
        ...
    ### ADD "transaction_id = 108" block
        20221101.000005 {
        ...
                transaction_id = 108
        ...
        }
    ###
        ...
}

Restore vg from modified vg0.backup.cfg

$ yes | vgcfgrestore vg0 -f vg0.221115.cfg --force
...
Volume group vg0 has active volume: 20221101.000005.
...


$ lvremove /dev/vg0/20221101.000005 ## transaction_id 108
  Logical volume "20221101.000005" successfully removed

$ lvs -a vg0
  LV                 VG      Attr       LSize   Pool       Origin    Data%  Meta%  Move Log Cpy%Sync Convert
  ...
  tpool0             vg0     twi-aotz--  16.00t                      27.82  0.29                            
  ...

Can't reclaim lvm "thin pool" space

1 Answers1