2

We have a system that's used for a GIS database (with Postgres as the underlying engine) which is using a software RAID 5 array of 4x2TB Samsung EVO870 SATA SSDs as its database drive. There is a nightly backup script that dumps the tables to a local temporary directory, GZips them, and transfers them to a separate machine (with mv). Normally the backup starts at 1830 and runs until 0500; yes, it's a big backup. A month or so ago, the external system fell off line, and so the mv step stopped working, and the temporary storage area filled up with unmoved files. After the external system was repaired, we noticed that the temp area was full and deleted everything out of it - about 3.5TB of files. About two weeks ago, we noticed that the daily backup was not completing until 1000. My suspicion is that things have slowed down because the temp directory, though erased, is not being purged, so when we have to write a new temp file as part of the backup, we have to clean SSD blocks before we can rewrite them.

fstrim -av does not print anything, which suggests that no filesystems are saying they have support for DISCARD.

This system does have LVM on top of the RAID array. The database and temp directories are in an ext4 filesystem (was ext2, but stuff happened) in its own LV that is mounted at /db; fstrim -v /db reports File system does not support DISCARD.

OS version: Debian Linux 8 (jessie), Linux 3.16.0-4-amd64 x86_64

RAID information:

root@local-database:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sda1[7] sdd1[4] sdc1[5] sdb1[6]
      5860147200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 1/2 pages [4KB], 524288KB chunk

root@local-database:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Dec 27 17:55:35 2015
     Raid Level : raid5
     Array Size : 5860147200 (5588.67 GiB 6000.79 GB)
  Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Aug  8 14:07:27 2023
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : local-database:0  (local to host local-database)
           UUID : 18d38d9a:daaa0652:8e43a020:133e5a4f
         Events : 53431

    Number   Major   Minor   RaidDevice State
       7       8        1        0      active sync   /dev/sda1
       6       8       17        1      active sync   /dev/sdb1
       5       8       33        2      active sync   /dev/sdc1
       4       8       49        3      active sync   /dev/sdd1

Information about the specific LV used for the database and temp areas:

  --- Logical volume ---
  LV Path                /dev/MainDisk/postgres
  LV Name                postgres
  VG Name                MainDisk
  LV UUID                TpKgGe-oHKS-Y341-029v-jkir-lJn8-jo8xmZ
  LV Write Access        read/write
  LV Creation host, time local-database, 2015-12-27 18:04:04 -0800
  LV Status              available
  # open                 1
  LV Size                4.78 TiB
  Current LE             1251942
  Segments               4
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     6144
  Block device           253:2

PV information:

root@local-database:~# pvdisplay
  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               MainDisk
  PV Size               5.46 TiB / not usable 2.50 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              1430699
  Free PE               121538
  Allocated PE          1309161
  PV UUID               N3tcTa-LBw2-D8gI-6Jg4-9v3T-KWn2-5CDVzK

I would really like to get the backup times back down to 11 hours, so that we're no longer colliding with actual work times. Is there something in the TRIM options that I can do here, or is there something else I've missed? I have checked that the database did not suddenly grow any new tables, or grow 50% overnight; there are no network connection issues, there was nothing odd that happened to the network or the external server just before we started taking 16 hours to back up as far as I can see. Is there anything else I'm missing?

Edit due to comments: The actual SSDs are only a year and a half old, replacing the original 250GB SSDs in April 2022. (Ran out of space, and the RAID array, LV, and filesystem were expanded in place.) We're using software RAID, bone-standard Linux with mdadm.

Edit in response to comments:

root@local-database:~# lsblk -d
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda    8:0    0  1.8T  0 disk
sdb    8:16   0  1.8T  0 disk
sdc    8:32   0  1.8T  0 disk
sdd    8:48   0  1.8T  0 disk

root@local-database:~# cat /sys/module/raid456/parameters/devices_handle_discard_safely
N

root@local-database:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 2
Model name:            AMD FX(tm)-8320 Eight-Core Processor
Stepping:              0
CPU MHz:               1400.000
CPU max MHz:           3500.0000
CPU min MHz:           1400.0000
BogoMIPS:              7023.19
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

According to an article linked by Nikita Kyprianov in the comments below, Samsung EVO 870s have serious trouble with AMD hardware, which this clearly is. So that would seem to be that. I guess we'll just have to live with it...

tsc_chazz
  • 905
  • 3
  • 14
  • Often the volume that a raid controller presents does not show the capabilities of the underlying disks, neither S.M.A.R.T. nor trim while the underlying disks do support that. Typically the applicable raid management utility will allow you to query stats from the underlying disks. But on a system that appears 8 years old odds are that your ssd’s are reaching the end of their life. – HBruijn Aug 09 '23 at 05:18
  • The SSDs are in fact only a year and a half old. When the RAID array and the LV were created 8 years ago the SSDs available were 250G; they were replaced in place with 2TB SSDs in April 2022 and the array and filesystem were then expanded to fill all the space. – tsc_chazz Aug 09 '23 at 06:03
  • please show `lsblk -d` and `modinfo raid456`. The latter has a parameter which enables use of discard (trim) on said RAID levels; notice that if devices don't do it properly your RAID will be screwed. Samsung SSDs were notorious about misbehaving queued trim in the past so badly so even [quirks were added](https://www.phoronix.com/news/Samsung-860-870-More-Quirks) into the kernel some to disable it on some models; maybe Linux just puts you and your data on the safe side. – Nikita Kipriyanov Aug 09 '23 at 16:13
  • oh sorry, I meant `cat /sys/module/raid456/parameters/devices_handle_discard_safely`, of course; I know it exists, I wanted to see actual value on your system :) – Nikita Kipriyanov Aug 09 '23 at 16:20

1 Answers1

2

You need to enable discard support in /etc/lvm.conf (issue_discards=1)

I can't remember if this needs to set in md but there's no mention in my local man pages.

symcbean
  • 21,009
  • 1
  • 31
  • 52
  • Given Nikita Kipryanov's comments above about the 870 and NCQ quirkiness, I'm going to have to do a little more research before I do that... but thank you. – tsc_chazz Aug 09 '23 at 17:09
  • This is valid suggestion, despite my comments. See `lsblk -d`: if /dev/sdX devices have zero discard values, see your `dmesg` for notices that Linux disabled discard on these SSDs; if /dev/sdX have non-zero discard values, but MD has zero, you have to tune MD, likely the parameter I suggested; if MD devices have non-zero but LVs have zero, it's LVM issue_discards problem. – Nikita Kipriyanov Aug 09 '23 at 17:21
  • In fact it would have to be set in `/etc/lvm/lvm.conf` - it's set 0 there now - and we'd have to set `/sys/module/raid456/parameters/devices_handle_discard_safely` to `Y`, but given that this is AMD hardware, and that enabling NCQ on this hardware is... probably dangerous, according to the article, I'm not prepared to risk it. I will tag this as the answer, though, as it clearly would be. – tsc_chazz Aug 09 '23 at 19:03