0

we are having a problem with our swift cluster, with a swift version 1.8.0. The cluster is built up from 3 storage nodes + a proxy node, we have 2 times replication. Each node sports a single 2TB sata HDD, the OS is running on an SSD. The traffic is ~300 1.3MB files per minute. The files are of the same size. Each file is uploaded with an X-expire-after with a value equivalent of 7 days.

When we started the cluster around 3 months ago we uploaded significantly less files (~150/m), everything was working fine. As we have put more pressure on the system, at one point the object expirer couldn't expire the files as fast as being uploaded, slowly filling up the servers.

After our analysis we found the following:

  • It's not a network issue, the interfaces are not overloaded, we don't have an extreme amount of open connections
  • It's not a CPU issue, loads are fine
  • It doesn't seem to be a RAM issue, we have ~20G free of 64G

The bottleneck seems to be the disk, iostat is quite revealing:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00    57.00    0.00  520.00     0.00  3113.00    11.97   149.18  286.21    0.00  286.21   1.92 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               2.00    44.00    7.00  488.00   924.00  2973.00    15.75   146.27  296.61  778.29  289.70   2.02 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     3.00   60.00  226.00  5136.00  2659.50    54.51    35.04  168.46   49.13  200.14   3.50 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  110.00   91.00  9164.00  2247.50   113.55     2.98   14.51   24.07    2.95   4.98 100.00

The read and write wait times are not always that good :), can go up into the thousands range msecs, which is pretty dreadful.

We're also seeing many ConnectionTimeout messages from the node side and in the proxy.

Some examples from the storage nodes:

Jul 17 13:28:51 compute005 object-server ERROR container update failed with 10.100.100.149:6001/sdf (saving for async update later): Timeout (3s) (txn: tx70549d8ee9a04f74a60d69842634deb)
Jul 17 13:34:03 compute005 swift ERROR with Object server 10.100.100.153:6000/sdc re: Trying to DELETE /AUTH_698845ea71b0e860bbfc771ad3dade1/container/whatever.file: Timeout (10s) (txn: tx11c34840f5cd42fdad123887e26asdae)
Jul 17 12:45:55 compute005 container-replicator ERROR reading HTTP response from {'zone': 7, 'weight': 2000.0, 'ip': '10.100.100.153', 'region': 1, 'port': 6001, 'meta': '', 'device': 'sdc', 'id': 1}: Timeout (10s)

And also from the proxy:

Jul 17 14:37:53 controller proxy-server ERROR with Object server 10.100.100.149:6000/sdf re: Trying to get final status of PUT to /v1/AUTH_6988e698bc17460bbfc74ea20fdcde1/container/whatever.file: Timeout (10s) (txn: txb114c84404194f5a84cb34a0ff74e273)
Jul 17 12:32:43 controller proxy-server ERROR with Object server 10.100.100.153:6000/sdc re: Expect: 100-continue on /AUTH_6988e698bc17460bbf71ff210e8acde1/container/whatever.file: ConnectionTimeout (0.5s) (txn: txd8d6ac5abfa34573a6dc3c3be71e454f)

If all the services pushing to swift and the object-expirer are stopped, the disk utilization stays at 100% for most of the time. There are no async_pending transactions, but there is a lot of rsyncing going on, probably coming from the object-replicator. If all are turned on, there are 30-50 or even more async_pending transactions at almost any given moment in time.

We thought about different solutions to mitigate the problem, this is the outcome basically:

  • SSDs for storage are too expensive, so won't happen
  • Putting another HDD in paired with each in a RAID0 cluster (we have replication in swift)
  • Using some caching, like bcache or flashcache

Does anyone of you have experience with this kind of problem? Any hints/other places to look for the root cause? Is there a possibility to optimize the expirer/replicator performance?

If any additional info is required, just let me know.

Thanks

Bszabo
  • 3
  • 1
  • 5

1 Answers1

0

I've seen issues where containers with >1 million objects cause timeouts (due to sqlite3 db not being able to get a lock)...can you verify your containers object count?

stensonb
  • 71
  • 7
  • Do you also mean containers inside containers? Because then we have more than 1M. Plain objects inside containers are not more than 1500, but we have a tree structure. – Bszabo Jul 20 '14 at 15:49
  • AFAIK - that "nested" logic is not supported in swift. A container can only hold objects. Now, you can externally model "objects" as pointers to other containers...but that should be fine. The swift docs say avoid containers with an object count >1 million. – stensonb Jul 30 '14 at 18:28
  • You're absolutely right, sorry for my mistake. Actually our problem seems to be that we have more than 1M objects in the container, so we'll cleanup and prevent the container in the future from storing too many objects. – Bszabo Aug 01 '14 at 09:02