2

I'm developing a kubernetes flexvolume driver that creates LVM devices, and creates and mounts filesystems.

For some reason I occasionally get deadlocks that according to the documentation shouldn't happen when using Popen.communicate().

 Traceback (most recent call last):
  File "/usr/libexec/kubernetes/kubelet/plugins/volume/exec/example~lvm/lvm", line 356, in <module>
    attach(cfg)
  File "/usr/libexec/kubernetes/kubelet/plugins/volume/exec/example~lvm/lvm", line 231, in attach
    result = _lvcreate(cfg['lv_name'], cfg['lv_size'], cfg['vg_name'])
  File "/usr/libexec/kubernetes/kubelet/plugins/volume/exec/example~lvm/lvm", line 148, in _lvcreate
    _out, _err = proc.communicate()
  File "/usr/lib64/python2.7/subprocess.py", line 800, in communicate
    return self._communicate(input)
  File "/usr/lib64/python2.7/subprocess.py", line 1401, in _communicate
    stdout, stderr = self._communicate_with_poll(input)
  File "/usr/lib64/python2.7/subprocess.py", line 1455, in _communicate_with_poll
    ready = poller.poll()
KeyboardInterrupt

This sometimes occurs during my lvcreate, and mkfs calls. Setting shell=True doesn't seem to matter.

_lv  = None
_cmd = [ '/sbin/lvcreate', '--type', 'linear', '--size', lv_size, '--name', lv_name, vg_name ]
_out, _err = None, None
proc = subprocess.Popen(_cmd, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
_out, _err = proc.communicate()
if proc.returncode != 0:
    return (_lv, _err, proc.returncode)

Environment:

$ uname -a
Linux myhost.example.com 4.1.12-124.17.2.el7uek.x86_64 #2 SMP Tue Jul 17 20:28:07 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux

# python -V
Python 2.7.5

If I set stderr=None instead of stderr=subprocess.PIPE I never see this issue.

lenik
  • 23,228
  • 4
  • 34
  • 43
BenH
  • 690
  • 1
  • 11
  • 23
  • Can you use `strace` to determine the specific syscall that's blocking? (or `sysdig` to trace both the Python interpreter and its child at the same time with much less overhead). – Charles Duffy Aug 31 '18 at 13:55
  • The deadlocks might be caused by stdot=subprocess.PIPE. Note Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes. – E.Serra Aug 31 '18 at 13:55
  • @E.Serra, the OP *is already* using `communicate()` to read from both pipes -- indeed, that's in the title itself. `communicate()` *requires* `stdout=PIPE` and `stderr=PIPE` to capture that content. – Charles Duffy Aug 31 '18 at 13:56
  • yes but he is still using stdout=subprocess.PIPE – E.Serra Aug 31 '18 at 13:57
  • @E.Serra, ...which is perfectly acceptable; communicate() is documented to read from both pipelines (whether in threads to use multiple simultaneous blocking calls or using `wait()`/`poll()`-style calls to read from whichever one has content ready is an implementation detail) and thus not get blocked whatever order a subprocess writes to them in. Why it's not conforming with that documentation is... well... the entire impetus for this question. – Charles Duffy Aug 31 '18 at 13:57
  • @BenH, ...part of why I'm asking for a syscall-level trace, btw, is to enable creation of a [mcve] -- a reproducer someone who isn't willing to be root and otherwise have all the setup in place to run your actual `lvcreate` command can invoke to see the problem themselves. – Charles Duffy Aug 31 '18 at 14:00
  • Could it be related to: If I set stderr=None instead of stderr=subprocess.PIPE I never see this issue. Maybe you are redirecting the return code and that it why you never see it in if proc.returncode != 0: return (_lv, _err, proc.returncode) – E.Serra Aug 31 '18 at 14:36
  • The official Python documentation would appear to contradict itself: "Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes." ...which requires setting `stdout=PIPE` and `stderr=PIPE` otherwise you get a tuple of `None`s, but this tip is documented on methods _other_ than `Popen` – Coder Guy Feb 26 '20 at 16:48

0 Answers0