2

I run a Kubernetes cluster with version 1.5.2, setup with Kops on AWS. The setup has nothing exotic. My nodes run on m4.xlarge with 70 Gb of disk storage with 1000 iops.

I have periods where some of my nodes get crazy with iops. Here is what I see:

enter image description here

So du take all my iops in the docker overlay directory. Here is what the kubelet logs display:

fsHandler.go:131] du and find on following dirs took 4.22914425s: [/var/lib/docker/overlay/592c1d88d1fd115f21e8fe6f198a8a27cd44efefb9b5dc58940fbf6d7999eda3 /var/lib/docker/containers/2347d28886bc0e6b74fc326538e1483927ddeb89b38e035acd845d5db621cb79]
fsHandler.go:131] du and find on following dirs took 24.94283434s: [/var/lib/docker/overlay/81f24df3624ebf7b7e45edc38fafeb41958bc675ae57fd0126c44cb2c3a6d6d6 /var/lib/docker/containers/43d576931081500fd4cd316afe5bfc6ff2442ff20e8e8266c27e930a0a77dd34]
fsHandler.go:131] du and find on following dirs took 18.478782737s: [/var/lib/docker/overlay/422ef31413df4e76de51acaa7d6ff6f77edc65fabde88a7c70e7edad3b1e55e5 /var/lib/docker/containers/1519a33729c8fb13297358edc53fe22f0b4b684636884976dfcb67c47fbf320b]
helpers.go:101] Unable to get network stats from pid 13515: couldn't read network stats: failure opening /proc/13515/net/dev: open /proc/13515/net/dev: no such file or directory
fsHandler.go:131] du and find on following dirs took 7.971745844s: [/var/lib/docker/overlay/45b83939bd1b4ec7dfa627bb6a9eb8b89a380007f9e22a93fff2ba4054252271 /var/lib/docker/containers/f6d3387423398d7dd4fac6c19ee0a1446d0465b5f9cf90289fcd605ad28c0d6e]
fsHandler.go:131] du and find on following dirs took 5.886763577s: [/var/lib/docker/overlay/8c01a73671eedb2e62c58fa12fc2d25df58c506545b6ea048fa0db1756d19f2c /var/lib/docker/containers/1d9c0ebcc6dbbd7065923f7f81c05c0d9d710aed0d353a1bab90ce1c994dfb57]
fsHandler.go:131] du and find on following dirs took 5.714942029s: [/var/lib/docker/overlay/26213ba30a17f240a9b9756a0d23ab32550f921de533667c9ab91cfb7f10ed5b /var/lib/docker/containers/7c27c242a49d8d33cee8b2e8335dae450af13b26f010794dc83ef5750a212d0d]
fsHandler.go:131] du and find on following dirs took 6.111478835s: [/var/lib/docker/overlay/0fe2bd0feeda24699bd6d443ca126ac1a33071cdff039ae9fd9159bbef80867b /var/lib/docker/containers/ec6fb966139e9666ec0be5e13399773f1971ddd99841b84167a7463402e28d73]
fsHandler.go:131] du and find on following dirs took 2.661604836s: [/var/lib/docker/overlay/04f9d01a8863cfee26e678e938fced84f826dda6ed03626dda11b6aad6901465 /var/lib/docker/containers/a4e37aee69c7523c46c5252c1834fa3fcd5a804a7aee256a468e44b4d6bcbd64]
fsHandler.go:131] du and find on following dirs took 11.834409809s: [/var/lib/docker/overlay/4cb1476621b90e2c2ee2b1131c0e6ac62f62dc3ca418129812b487bffac1d827 /var/lib/docker/containers/5a01521cfdd3041aff128dce7353ab336ddafa60c8c0b2254fb6bae697cb1676]
rmonjo
  • 2,675
  • 5
  • 30
  • 37
  • Are there lots of containers/pods starting and then immediately stoping? I've seen this kind of thrashing when your container fails to start and the orchestration just keepz retrying – Robo Mar 13 '17 at 03:34
  • All of my pods are running for more than 5 days. So it doesn't look like it's due to pod restart. Plus I only have long running process. `iotop` keep showing me processes like `du -s /var/lib/docker/overlay/DIGEST`. – rmonjo Mar 13 '17 at 08:58

1 Answers1

0

I recommend upgrading to k8s version 1.6, there are many updates noted in the CHANGELOG that should help debug your issue.

Generally, EBS volumes are not fully available in terms of IO unless you have fully "pre-warmed" them by reading and writing to every block on the device.

diclophis
  • 2,444
  • 1
  • 17
  • 10
  • 2
    I have been running kubernetes 1.7 and I see the same issue myself. In my case I have large containers (about 2GB each in size) so I can see why du would take so long. But it looks like this process runs about once a minute. – Lindsay Landry Aug 24 '17 at 13:41