1

I have a Python application where I uses the Python sh module multiple times to run commands. Also, I use LXD containers to run isolated tests.

I noticed a very different performance running my tests on LXD container, so I started to reduce the complexity of the Python script.

Now the scrip is a simple sh.nice() but there is heavy difference between the host and the lxc container.

Host

$ time python -c "import sh; sh.nice()"
real    0m0.077s
user    0m0.052s
sys     0m0.012s

Container

$ time python -c "import sh; sh.nice()"
real    0m0.215s
user    0m0.088s
sys     0m0.120s

My next step was use strace which says that the container version calls the syscall close 1,048,796 times!! The majority of the times it returns EBADF (Bad file descriptor).

Here is what I have done; what's happening?

$ uname -a
Linux cmp-1 4.4.0-96-generic #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ lxc launch ubuntu:precise new-precise-sh
$ lxc exec new-precise-sh -- bash
$ apt-get install python-pip
$ pip install sh
$ strace -f -e close python -c "import sh; sh.nice()" 2>&1 | wc -l
1048796

Running strace with output in different files. There is one too big strace.2618

$ strace -ff -o strace python -c "import sh; sh.nice()"
$ ls -la
total 75276
drwxrwxr-x  2 user   user       4096 Sep 27 16:58 .
drwxr-xr-x 41 user   user       4096 Sep 27 16:45 ..
-rw-r--r--  1 root   root     121780 Sep 27 16:33 strace.2615
-rw-r--r--  1 root   root       3995 Sep 27 16:33 strace.2616
-rw-r--r--  1 root   root       6108 Sep 27 16:33 strace.2617
-rw-r--r--  1 root   root   76558803 Sep 27 16:33 strace.2618
-rw-r--r--  1 root   root     362363 Sep 27 16:33 strace.2619
-rw-r--r--  1 root   root      10748 Sep 27 16:33 strace.2620

The content of the file something like this:

$ cat strace.2618
...................
...................
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = -1 EBADF (Bad file descriptor)
close(6)                                = -1 EBADF (Bad file descriptor)
close(7)                                = 0
close(8)                                = 0
close(9)                                = -1 EBADF (Bad file descriptor)
close(10)                               = 0
close(11)                               = -1 EBADF (Bad file descriptor)
...................
...................
close(33)                               = -1 EBADF (Bad file descriptor)
close(34)                               = -1 EBADF (Bad file descriptor)
close(35)                               = -1 EBADF (Bad file descriptor)
close(36)                               = -1 EBADF (Bad file descriptor)
close(37)                               = -1 EBADF (Bad file descriptor)
close(38)                               = -1 EBADF (Bad file descriptor)
...................
...................
close(1048568)                          = -1 EBADF (Bad file descriptor)
close(1048569)                          = -1 EBADF (Bad file descriptor)
close(1048570)                          = -1 EBADF (Bad file descriptor)
close(1048571)                          = -1 EBADF (Bad file descriptor)
close(1048572)                          = -1 EBADF (Bad file descriptor)
close(1048573)                          = -1 EBADF (Bad file descriptor)
close(1048574)                          = -1 EBADF (Bad file descriptor)
close(1048575)                          = -1 EBADF (Bad file descriptor)
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • Python will try to close all fds from third up to maximum (which is usually a ulimit for file descriptors): https://github.com/certik/python-2.7/blob/c360290c3c9e55fbd79d6ceacdfc7cd4f393c1eb/Lib/subprocess.py#L1097 Consider reducing soft-limits via `/etc/security/limits.conf` (depends on distro) – myaut Sep 27 '17 at 18:17
  • Thanks!! Here is the reason, https://github.com/lxc/lxd/issues/3860. sh module is who is closing all the file descriptors in lxd container where the limit is much higher – carlosduelo Sep 27 '17 at 20:22
  • I think you should post solution to your problem from issue you have opened so this question won't be left unanswered. – myaut Sep 28 '17 at 17:15

1 Answers1

0

The project leader of LXC, LXD and LXCFS answer the question here.

This iterates over every single fd number up to RLIMIT_NOFILE, closing them all one by one, regardless of whether the fd actually does exist.

On a normal system, this will be capped at 1024 as the most comment NOFILE limit. In LXD containers, we have it bumped to a much higher value, causing it to take longer.

So maybe it is better to use subprocess module on a lxd container.

Community
  • 1
  • 1