0

We are evaluating using SPDK as an internal framework to build a data recorder with NVMe devices.

Disk and SSD devices have had smartctl interfaces which give you package temperatures for a while. It looks like smartctl is now smart enough to do NVMe devices as well. However, once SPDK is "setup", the kernel module that supports smartctl is gone and no longer functions on those devices.

I'm finding references to "temperature" thresholds in the spec, but I am not finding a "read the current device package temperature".

The SPDK under Linux looks like a nice performance package, but if it blocks getting basic health information on the underlying hardware, then it's a non-starter.

Nufosmatic
  • 145
  • 1
  • 11

1 Answers1

0

I got my mind right this morning and I thought I'd share:

  • In "examples/nvme", there exists "identify" which provides much of the "health" information one would usually get from "smartctl".
  • If you naively attempt to run "identify" concurrently with "perf", you will discover that you can run one or the other, but not both, complaining about "claiming" a device.
  • If you look at the command options, you will find "shared memory ID", typically "-i ID", which indicates an shared memory ID that multiple processes can access concurrently. You can now run "perf -i ID ..." and then run "identify -i ID ..." and, for instance, watch the temperature on the packages rise over time.
  • If you look at the code for "nvme/hello_world", you will find that spdk_env_opts has a field "shm_id". This is apparently what gets populated from the above "-i ID" options on the command line of these other examples. If you fix up "hello_world" to set shm_id = -1 (default - no shared memory), then capture and option and update this field to the ID value, you will be able to get the "hello_world" to work along with "perf" and/or "identify".
  • hello_world could be a place to make a simpler temperature sensor (using HEALTH message as the data source), or to include health sensing in a larger application.
  • This process still gets blivits in the involved processes. I haven't figured this out [yet].
Nufosmatic
  • 145
  • 1
  • 11