I'm banging my head against the wall with this issue. We're running many containers in parallel, they're running simple filesystem operations or simple linux commands and some of them under certain circumstances fail with memory allocation issues, the Docker container get OOMKiled.
I believe it is not related to the specific command. tail
is not the only command that fails, we have also encountered cp
or gzip
.
We have narrowed down the issue and created a script, that will fail almost certainly when the parameters are adjusted accordingly to the underlying system.
https://github.com/keboola/processor-oom-test
The script with the default settings generates a random CSV with 100M rows (~2.5GB), copies it 20 times and then runs 20 containers running tail -n +2 ...
. On a m5.2xlarge
AWS EC2 instance with 1TB SSD some of the containers are OOMKilled (and some end with different errors). The processes are terminated with various errors:
/code/tail.sh: line 2: 10 Killed tail -n +2 '/data/source.csv' > '/data/destination.csv'
tail: error reading '/data/source.csv': Cannot allocate memory
tail: write error
(the last one is not OOMKilled)
I am not aware that tail
should consume any memory at all. If the number of concurrently working containers is low enough, it can easily survive with 64MB of memory. With larger number of containers even 256MB is not enough mempry. I have been watching htop
and docker stats
and havent seen any spikes in memory consumption.
Things we have already tried
- different Docker imges (alpine, centos, ubuntu)
- different file systems (ext3, xfs)
- different os distros (centos, ubuntu)
- different instance providers (Digital Ocean, AWS)
- different types of instances and block devices
- filesystem swap/swappiness
- Docker memory swap and swappiness
Some of that helped only partially. Adjusting memory limit or number of containers made it crash again every time. We had a container with 1GB memory running simple tail
on a large file crash with OOMKilled.
Further on what I have tried several months ago - https://500.keboola.com/cp-in-docker-cannot-allocate-memory-1a5f57113dc4. And the --memory-swap
turned out only to be only a partial help.
Any suggestions? I am no Linux expert, so I may be missing something important. Any help or advice is greatly appreciated.