4

I'm getting unexpectedly low transactions per second on GCE's "Local SSD" option (comparing to SSD Persistent Disk) using simple "pgbench" tests:

# With Local SSD
# /dev/mapper/vg0-data on /data type xfs (rw,noexec,noatime,attr2,inode64,noquota)
pg-dev-002:~$ pgbench -c 8 -j 2 -T 60 -U postgres
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 2
duration: 60 s
number of transactions actually processed: 10765
tps = 179.287875 (including connections establishing)
tps = 179.322407 (excluding connections establishing)

# With SSD Persistent Disk
# /dev/mapper/vg1-data on /data1 type xfs (rw,noexec,noatime,attr2,inode64,noquota)
pg-dev-002:/data$ pgbench -c 8 -j 2 -T 60 -U postgres
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 2
duration: 60 s
number of transactions actually processed: 62457
tps = 1040.806664 (including connections establishing)
tps = 1041.012782 (excluding connections establishing)

"fio" benchmarks show the advertised IOPS and throughput for Local SSD. However, executing "pg_test_fsync" on a Local SSD mount leads me to believe fsync latency is the culprit. The Local SSD numbers are after applying Google's IRQ script here:

# Local SSD
open_datasync                     319.738 ops/sec    3128 usecs/op
fdatasync                         321.963 ops/sec    3106 usecs/op

# Persistent SSD
open_datasync                    1570.305 ops/sec     637 usecs/op
fdatasync                        1561.469 ops/sec     640 usecs/op
  • Tested with Ubuntu 14.04 and Debian 7 images
  • Instance type: n1-highmem-4
  • Mount options are identical for both volume types

I haven't seen anything regarding limitations of fsync and the Local SSD, but I'm not sure where else to check or test.

BeepBoop
  • 283
  • 2
  • 10
  • 2
    I would not call 320 ops/s especially slow. It probably mainly means that the SSD doesn't lie about `fsync` and actually does a reliable flush. If it doesn't have a supercapacitor or backup battery to let it return immediately then 320 iops is quite believable. The persistent SSD numbers are on the edge of believable, at 0.5ms round trips if it's truly flushing to persistent storage over a fast network. – Craig Ringer Oct 02 '15 at 08:17
  • Thanks for the response, Craig. That does certainly make sense. I probably see high IO and throughput with fio tests because of the write cache (which apparently cannot be disabled). RDBMS is not really the suggested use-case for these block devices, but it's interesting. I suppose the next question I should ask Google if this is, in fact, a singular disk or a slice of a local bbu-backed RAID that I might be able to disable write barriers on the mounted volume. In one simple test, this made for some great fsync improvement (at the cost of data resilience) – BeepBoop Oct 12 '15 at 21:39

2 Answers2

3

Comparing a single local SSD/HDD/etc. to a SAN type RAID controller, is like comparing a VW Beetle to an Audi RS10 Le-mans car, yes, they both came out of the same factory, and they both use fourstroke-engines/SSDs/HDDs, but their tunings etc. are way, way different.

I can give you several examples of experiences gained, but the simple answers are related to the massive amounts of battery backed up RAM cache that the SAN based storage have compared to the non on the local SSD/HDD. Even SSDs can't quite compete with battery backed DDR3 RAM when it gets to confirming a back that data have been "committed" to disk. Further more the single localdisk, can (realistically) only handle a single operation at a time writing a block to "disk", vs. the battery backed SAN systems that can handle multiple requests simultaneously "writing to disk" (as it committed the data to the battery backed up DDR3 RAM )

Lastly, the question might be which local SSD disk are being used, as I've seen massive differences in write performances between different sizes of the same family of SSDs (the bigger the faster) not to mention the various speeds of the various SSD disk makes out there.

YEs, SSDs are faster than HDDs, but not yet as fast as battery backed DDR3 RAM ;)

Hvisage
  • 386
  • 2
  • 7
0

Google acknowledges that write cache flushes are fairly slow on local SSD and provides steps to disable write caching on the filesystem mount to avoid that delay [if suitable for your use case]. Docs are here.

Chris Madden
  • 193
  • 1
  • 7