5

I have two Kingston A400 120GB SSD as cache in a Synology NAS, which don't seem to support automatic offline data collection.

# smartctl -d sat -c /dev/sdc | grep -i "Auto Offline data collection" 
Auto Offline Data Collection: Disabled.  
No Auto Offline data collection support.
# smartctl -d sat -o on /dev/sdc
SMART Automatic Timers not supported
SMART Enable Automatic Offline failed: scsi error aborted command

Yet when I check the attributes marked as "Offline", the RAW_VALUE one of them keeps changing (specifically 246 Total_Erase_Count), even if I dont run the manual offline data collection or self-tests in between. I checked if smartd was running just in case, but it isn't. Same thing happens with the other identical SSD.

Questions:

  1. What exactly does the offline data collection update? Does it just update the VALUE/WORST/THRESH columns in the attribute table?
  2. Do short or long self-tests update the SMART attribute data?

Output of smartctl -a:

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37120G
Serial Number:    [...]
LU WWN Device Id: [...]
Firmware Version: 03070009
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Fri Apr 12 01:55:30 2019 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x35) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME                                                   FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate                                              0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours                                                   0x0032   100   100   000    Old_age   Always       -       710
 12 Power_Cycle_Count                                                0x0032   100   100   000    Old_age   Always       -       5
148 Unknown_Attribute                                                0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute                                                0x0000   100   100   000    Old_age   Offline      -       0
167 Unknown_Attribute                                                0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count                                             0x0012   100   100   000    Old_age   Always       -       0
169 Unknown_Attribute                                                0x0000   100   100   000    Old_age   Offline      -       65
170 Bad_Blk_Ct_Erl/Lat                                               0x0000   100   100   010    Old_age   Offline      -       0/78
172 Unknown_Attribute                                                0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct                                                   0x0000   100   100   000    Old_age   Offline      -       0
181 Program_Fail_Cnt_Total                                           0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count_Total                                           0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect                                               0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count                                            0x0012   100   100   000    Old_age   Always       -       1
194 Temperature_Celsius                                              0x0022   024   025   000    Old_age   Always       -       24 (Min/Max 24/25)
196 Not_In_Use                                                       0x0032   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count                                                  0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count                                                  0x0032   100   100   000    Old_age   Always       -       4
231 SSD_Life_Left                                                    0x0000   100   100   000    Old_age   Offline      -       0
233 Flash_Writes_GiB                                                 0x0032   100   100   000    Old_age   Always       -       396
241 Lifetime_Writes_GiB                                              0x0032   100   100   000    Old_age   Always       -       304
242 Lifetime_Reads_GiB                                               0x0032   100   100   000    Old_age   Always       -       228
244 Average_Erase_Count                                              0x0000   100   100   000    Old_age   Offline      -       2
245 Max_Erase_Count                                                  0x0000   100   100   000    Old_age   Offline      -       10
246 Total_Erase_Count                                                0x0000   100   100   000    Old_age   Offline      -       3827

SMART Error Log not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported
Sam Martin
  • 2,044
  • 2
  • 13
  • 10
Bangaio
  • 160
  • 1
  • 8

1 Answers1

2

Short answer: SSDs encapsulate internal data collection and reporting behind complex controller and FTL firmware, so what you see at SMART level rarely is a complete representation of their internal state. Don't worry about offline tests being apparently disabled, as most probably the controller runs its own sanity tests and updates both online and offline attributes accordlying (unless it does not - some firmware purposely mangle SMART attributes, but this happens even for HDDs and you can not do anything about that).

Long answer: SMART offline data collection is a poorly defined way to collect data about the disk which, in principle, can degrade IO performance because the specific tests/collections can not be truly run in parallel with user data IO. Hence the "offline" word - the disk firmware is free to suspend user IO during an offline attribute collection. For this reason offline collection can be totally disabled, explicitly requested from the user at scheduled time or (if the disk support it) automatically run with programmed timer.

However, offline tests were never officially included in the ATA standard (albeit present in other storage-related standards), leaving the door open to (often undocumented) firmware-specific behavior.

For any disk I used in the last 15+ years, offline tests really were "online" one, with no performance drop during data collection. The only difference with online tests is that offline ones are collected at specific firmware-depended schedules (ie: each 4 hours).

The only exception I found is about Offline surface scan, a specific offline sub-test which scans the entire platters surface (or NAND chips, for SSDs) for defect. Being such an intensive test, it is specifically reported and can sometime be enabled/disabled selectively. However most HDDs (and SSDs) reports surface scan as not supported, implementing a firmware and model-specific scan instead. For example most consumer HDDs do no surface scan at all, while enterprise disks automatically scan their surface even when SMART reports surface scan as disabled. SSDs are much more complex and the controller is required to periodically scan flash status to rewrite marginal pages, so surface scan has basically no meaning for them.

shodanshok
  • 47,711
  • 7
  • 111
  • 180