0

I'm trying to write a little service that generates some StableDiffusion images. It is a simple FastAPI application with a Celery queue running in a docker container with access to a GPU. When running the application outside of the docker container everything works fine, but when launched in docker it halts.

It is not throwing any errors, not returning anything. It just... stops.

Spraying inner code of the libraries I'm using with some logs I've found out that the stopping point is this function:

# it's in torch.nn.functional
# function `embedding`

if has_torch_function_variadic(input, weight):
    ...

Inspecting the has_torch_function_variadic function I've encountered this:

# this one is from torch._C.__init__.py
def _has_torch_function_variadic(*args, **kwargs): # real signature unknown
    """
    Special case of `has_torch_function` that skips tuple creation.
    
        This uses the METH_FASTCALL protocol introduced in Python 3.7
    
        Instead of:
          `has_torch_function((a, b))`
        call:
          `has_torch_function_variadic(a, b)`
        which skips unnecessary packing and unpacking work.
    """
    pass

So the StableDiffusion pipeline stops working, because it stumbles on some C code it for some reason can't execute. I'm really quite lost at this point and don't know what can be the issue here

I've tried to rearrange my container build - previously I've tried to make a two step build to reduce container size, but I thought that transferring files from one container to another when I don't really know where all of the stuff is saved may lead to big troubles. Rearranging my Dockerfile into a single stage build did not helped.

I guess that the problem may be in a way I download a pre-trained model - to avoid long download times every time I rebuild containers on a first request to a SD pipeline I pre-download it in a little script executed every time on a container launch. Here's the script:

import torch
from diffusers import StableDiffusionPipeline


pipe = StableDiffusionPipeline.from_pretrained(
    'prompthero/openjourney-v4',
    torch_dtype=torch.float16
)

And a little bash script to launch it in a Dockerfile:

#!/bin/bash

python build.py

My Dockerfile looks like so:

FROM python:3.10

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

WORKDIR /code/

COPY requirements.txt /code/
RUN pip install -r requirements.txt

COPY build.py /code/
COPY docker-build.sh /code/

RUN ./docker-build.sh

COPY . /code/

Is there any way my problem can be resolved? Is it even a standard practice to run neural networks in docker containers, also launched from a celery task, or all my approach here is bogus? Any help or suggestions are much appreciated!

UPD

So I've tried to switch from Celery to FastAPI background tasks and got some interesting developments:

StableDiffusionPipeline seems to trigger successfully, if it is launched in a separate thread instead of a separate process (as in case with Celery that creates new processes), but it still fails with error. I will include the error later, when I'm back at work, but long story short - it's some error with CUDA and NVIDIA drivers being installed incorrectly.

So here's what we've got for now:

  1. Pytorch falls into the void when launched in Celery inside of Docker. I'm not sure if it's exclusively a docker, docker + celery or exclusively celery issue, will test it later

  2. Problem may be in a multiprocessed nature of Celery - it may be that PyTorch is not really suited for running in a separate process and it results in multiple errors.

  3. I've reported an issue in the official PyTorch Github repo - I hope that library maintainers may clear up some confusions. Here's the issue for anyone interested in a development of this story: https://github.com/pytorch/pytorch/issues/103752

I'll keep everyone informed here and in Github issue. If anyone else knows what can be the reason of this behaviour, please let me know

UPD-2:

I've ran some more tests with multithreaded approach of FastAPI background tasks and without docker and now I can confirm - this is not an issue of Docker.

You see, even when launching Celery on native machine without any containerization, code still falls into the void. Therefore I can confirm that it is a Celery issue, and not containerization.

I'll change tags on the question to avoid confusion.

Also, here's a traceback that I get when I try to launch a pipeline in a multithreaded environment:

Traceback (most recent call last):
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
...

  <-- Abridged till the pipeline gets into the mix -->

...
  File "/home/raa/PycharmProjects/air_local_ml/ml_local/models/services/openjourney.py", line 25, in generate
    image = self.pipe(
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 645, in __call__
    prompt_embeds = self._encode_prompt(
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 357, in _encode_prompt
    prompt_embeds = self.text_encoder(
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 816, in forward
    return self.text_model(
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 725, in forward
    encoder_outputs = self.encoder(
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 654, in forward
    layer_outputs = encoder_layer(
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 382, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/raa/PycharmProjects/air_local_ml/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Exploring the error I've found out that it is generally a driver/CUDA issue and they need to be reinstalled, but when the app is launched in a single process + single thread environment, everything works correctly

It can be that launching PyTorch as a non-main thread (or if you dare - non-main process) can lead to these bugs appearing. Not really sure if it's not deigned to be ran this way, or if it's also an issue of settings.

I will dig some more info on the bug + will update Github issue on the matter. Hope to fix it sooner than later and bring you it resolved.

  • this is an excellent question. You have done a lot of work and discovery... i have come across problems with `c code` before and have had to stop as one needs to inspect and correct this. It is really broken here as you have correctly identified. – D.L Jun 16 '23 at 09:18
  • i have never been a fan of docker, but clearly a docker setting ? (given stuck at the `c file`). – D.L Jun 16 '23 at 09:23
  • @D.L thank you for the suggestion! It may be a docker setting, but I'm not really sure which one - may have a deep dive tonight to find what works or what not – Redman_plus Jun 16 '23 at 12:09
  • well, put it this way. You confirm it works outside of docker :) – D.L Jun 16 '23 at 17:43
  • Have you tried running the container in interactive mode and then running it in the interactive terminal? that may give you more debugging options. – justhecuke Jun 17 '23 at 06:15
  • Also, have you tried using the official pytorch docker image? https://github.com/pytorch/pytorch#using-pre-built-images – justhecuke Jun 17 '23 at 06:17
  • You also need to tell us what command you used to run your container... using GPUs in docker requires specific flags. they're easy to add, but you have to have them. – justhecuke Jun 17 '23 at 06:18
  • @justhecuke Answering your questions: 1) No, I haven't tried an interactive mode - might use it on monday when I'm back at work and update you on the results 2) Here's also no, also will try it on monday 3) I am using docker compose for launcing the containers with simple `sudo docker compose up --build` with specifying gpu as a device in `docker-compose.yml`. I've included an example of my compose file in a github issue for pytorch, but I might include it here also a little bit later – Redman_plus Jun 17 '23 at 15:28

0 Answers0