Tracking dangling threads in python

Question

I've got a python 3.7.2 asyncio based application. There is an endpoint exposing some thread info:

threads_info = {}
for thread in enumerate():
    threads_info[thread.__str__()] = traceback.format_stack(sys._current_frames()[thread.ident])

For all I know there should be no threads running other than the main thread, however when I query the endpoint I see this weird ThreadPoolExecutor. It starts with just one worker and keeps increasing:

Any ideas why, how and what is this ThreadPoolExecutor? Perhaps there is some way to see where in the code is it created or which package creates it?

The Dockerfile I use to run my app:

FROM python:3.7.2-alpine as base

FROM base as builder
RUN mkdir /install
WORKDIR /install
COPY requirements /requirements
RUN apk add \
    "gcc>8.2.0" \
    "g++>8.2.0" \
    "libffi-dev>3.2.1" \
    "musl-dev>1.1.20"
RUN pip install --install-option="--prefix=/install" -r /requirements

FROM base
RUN apk add --no-cache procps
COPY --from=builder /install /usr/local
COPY src /app
WORKDIR /app
RUN mkdir logs
ENTRYPOINT ["python", "-u", "app.py"]
EXPOSE 80/tcp

My requirements file:

quart==0.8.1
aiohttp==3.5.4
cchardet==2.1.4
aiodns==1.2.0
requests==2.21.0
psutil==5.6.1

score 2 · Answer 1 · answered Mar 21 '19 at 09:26

Any ideas why, how and what is this ThreadPoolExecutor?

ThreadPoolExecutor is the thread pool implementation provided by the concurrent.futures module. It is used for asynchronous execution of synchronous code by handing it to a separate thread. The pool's purpose is to avoid the latency of creating and joining a thread for each separate task; instead, a pool creates the worker thread only once, and keeps it in the pool for later usage. The maximum number of threads in the pool can be configured and defaults to the number of cores multiplied by 5.

The threads you see in your code belongs to a ThreadPoolExecutor instantiated by one of the libraries you are using. Specifically, asyncio creates an executor for use by the run_in_executor method. This executor is used by asyncio itself to provide async interface to calls that natively do not have one, such as OS-provided DNS resolution.

In general, when using non-trivial third-party libraries, you cannot assume that your code will be the only one to create threads. When iterating over live threads, you simply ignore those that you didn't create, which can be accomplished for example by marking the threads you create with a custom attribute on the Thread object.

score 0 · Accepted Answer · answered Mar 21 '19 at 13:44

Perhaps there is some way to see where in the code is it created or which package creates it?

Yup, as previous answer mentioned, it was the asyncio default executor. In order to debug which package is the culprit I had to write my own executor:

class AsyncioDefaultExecutor(ThreadPoolExecutor):

    def __init__(self, thread_name_prefix='', max_workers=None):
        self.logger = get_logger("asyncioTh")
        super(AsyncioDefaultExecutor, self).__init__(thread_name_prefix=thread_name_prefix)

    def submit(self, fn, *args, **kwargs):
        debug_info = "Function " + fn.__name__ + " in " + fn.__code__.co_filename + ":" + \
                     str(fn.__code__.co_firstlineno) + "\n" + "".join(traceback.format_stack())
        self.logger.info(debug_info)
        return super(AsyncioDefaultExecutor, self).submit(fn, *args, **kwargs)

and set it as default executor:

loop.set_default_executor(AsyncioDefaultExecutor())

This resulted in a nice traceback every time a new task is submitted.

Tracking dangling threads in python

2 Answers2