High Latency is being observed in AWS ARM Graviton Processor in Comparison to AMD Processor for ASGI based Django Application

Question

I am running an Asgi-based Django Application(Rest Framework) using AWS Kubernetes in the production environment. Everything is running fine at AMD Processor(c5.2xlarge, c5.4xlarge). To decrease the cost we are trying to migrate the application to AWS Graviton Processor(c6g.2xlarge, c6g.4xlarge). But we are observing an increase in the 90% latency to 10X.

The command used for running the application -

DD_DJANGO_INSTRUMENT_MIDDLEWARE=false ddtrace-run gunicorn --workers 1 --worker-tmp-dir /dev/shm --log-file=- --thread 2 --bind :8080 --log-level INFO --timeout 5000 asgi:application -k uvicorn.workers.UvicornWorker

I have one more application that is WSGI based and it's working fine at the graviton processor.

Attaching the docker code -

FROM python:3.9-slim

RUN apt update -y
RUN mv /var/lib/dpkg/info/libc-bin.* /tmp/ &&  apt install libc-bin &&  mv /tmp/libc-bin.* /var/lib/dpkg/info/
#
### Create a group and user to run our app
## ARG APP_USER=user
## RUN groupadd -r ${APP_USER} && useradd --no-log-init -r -g ${APP_USER} ${APP_USER}
#
## Install packages needed to run your application (not build deps):
##   mime-support -- for mime types when serving static files
##   postgresql-client -- for running database commands
## We need to recreate the /usr/share/man/man{1..8} directories first because
## they were clobbered by a parent image.
RUN set -ex \
    && RUN_DEPS=" \
    libpcre3 \
    git \
    mime-support \
    postgresql-client \
    libmagic1\
    fail2ban libjpeg-dev libtiff5-dev zlib1g-dev libfreetype6-dev liblcms2-dev libxslt-dev libxml2-dev \
    gdal-bin sysstat libpq-dev binutils libproj-dev procps" \
    && seq 1 8 | xargs -I{} mkdir -p /usr/share/man/man{} \
    && apt-get update && apt-get install -y --no-install-recommends $RUN_DEPS \
    && rm -rf /var/lib/apt/lists/*

ADD requirements /requirements
#ADD package.json package.json
#
## Install build deps, then run `pip install`, then remove unneeded build deps all in a single step.
## Correct the path to your production requirements file, if needed.
RUN set -ex \
    && BUILD_DEPS=" \
    build-essential \
    libpcre3-dev \
    libpq-dev \
    " \
    && apt-get update && apt-get install -y --no-install-recommends $BUILD_DEPS \
#    && npm install --production --no-save \
    && pip install --no-cache-dir -r /requirements/requirements.txt \
    && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false $BUILD_DEPS \
    && rm -rf /var/lib/apt/lists/*
#
RUN rm -rf /requirements

RUN mkdir /code/
WORKDIR /code/
ADD . /code/

COPY ./scripts /scripts
RUN chmod +x /scripts/*

RUN mkdir -p /vol/web/media
RUN mkdir -p /vol/web/static
RUN groupadd -r user
RUN useradd --no-log-init -r -g user user

RUN chown -R user:user /vol
RUN chmod -R 755 /vol/web
USER user

Python Modules - 
aiohttp==3.8.1
aiosignal==1.2.0
amqp==5.1.1
anyio==3.6.1
asgiref==3.5.2
asttokens==2.0.5
async-timeout==4.0.2
attrs==21.4.0
aws-requests-auth==0.4.3
Babel==2.9.1
backcall==0.2.0
billiard==3.6.4.0
black==22.6.0
boto3==1.9.62
botocore==1.12.253
bytecode==0.13.0
celery==5.2.7
certifi==2022.6.15
charset-normalizer==2.0.12
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
ddsketch==2.0.4
ddtrace==1.3.0
decorator==5.1.1
Django==4.0.1
django-appconf==1.0.5
django-cache-memoize==0.1.10
django-compressor==3.1
django-compressor-autoprefixer==0.1.0
django-cors-headers==3.11.0
django-datadog-logger==0.5.0
django-elasticsearch-dsl==7.2.2
django-elasticsearch-dsl-drf==0.22.4
django-environ==0.8.1
django-extensions==3.1.5
django-libsass==0.9
django-log-request-id==2.0.0
django-nine==0.2.5
django-prometheus==2.2.0
django-sites==0.11
django-storages==1.12.3
django-uuid-upload-path==1.0.0
django-versatileimagefield==2.2
djangorestframework==3.13.1
djangorestframework-gis==0.18
docutils==0.15.2
elasticsearch==7.17.1
elasticsearch-dsl==7.4.0
executing==0.8.3
frozenlist==1.3.0
geographiclib==1.52
geopy==2.2.0
gunicorn==20.1.0
h11==0.12.0
httpcore==0.14.7
httpx==0.21.3
idna==3.3
ipython==8.0.1
jedi==0.18.1
jmespath==0.10.0
JSON-log-formatter==0.5.1
kombu==5.2.4
libsass==0.21.0
matplotlib-inline==0.1.3
multidict==6.0.2
mypy-extensions==0.4.3
packaging==21.3
parso==0.8.3
pathspec==0.9.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
platformdirs==2.5.2
prometheus-client==0.14.1
prompt-toolkit==3.0.30
protobuf==4.21.2
psycopg2-binary==2.9.3
ptyprocess==0.7.0
pure-eval==0.2.2
Pygments==2.12.0
pyparsing==3.0.9
python-dateutil==2.8.2
python-dotenv==0.19.2
python-magic==0.4.27
pytz==2021.3
rcssmin==1.1.0
regex==2022.1.18
requests==2.27.1
rfc3986==1.5.0
rjsmin==1.2.0
s3transfer==0.1.13
six==1.16.0
sniffio==1.2.0
sqlparse==0.4.2
stack-data==0.3.0
tenacity==8.0.1
tomli==2.0.1
traitlets==5.3.0
typing_extensions==4.3.0
urllib3==1.25.11
uvicorn==0.17.0
vine==5.0.0
wcwidth==0.2.5
whitenoise==5.3.0
yarl==1.7.2

Docker build command - docker buildx build --push --platform linux/amd64,linux/arm64 -t ${ECR_LATEST_TAGX} -t ${ECR_VERSION_TAGX} --output=type=image --file ../incr.Dockerfile ../

Have you verified that your installation of the python modules is the same between c5 and c6g? Python modules sometimes contain native code in their wheels, and if that native code is missing on Graviton it could be using slower fallback code written in Python leading to a slowdown. For help with tracking down performance issues, I'd recommend looking at the AWS Graviton getting started guide - https://github.com/aws/aws-graviton-getting-started — Geoffrey Blake, Aug 03 '22 at 19:34
Thanks, @GeoffreyBlake. I have verified the python modules and it's same for the both c5 and c6g. — Manish Agrawal, Aug 04 '22 at 10:37
I have also edited the post by adding docker configurations and installed python modules. — Manish Agrawal, Aug 04 '22 at 10:38
The docker and requirements.txt look reasonable to me. What else is running on the instance besides the gunicorn process? Its only using 1 worker and 2 threads, which I assume means the 2xl and 4xl instances are almost entirely idle? Does increasing the number of workers help? How do P50, P99 and average latencies compare? I only see P90 mentioned. Is it possible to get a profile of your two running deployments? I see this tool as potentially useful: https://github.com/benfred/py-spy/tree/master/src — Geoffrey Blake, Aug 04 '22 at 14:50
As I am using Kubernetes, to increase the number of workers, we increase the number of pods. And increasing the number of workers is not helping much. To measure the metrics I am using Datadog. I have also done the profiling(using datalog)of the code for which I have added the profiling snapshot here - https://drive.google.com/drive/folders/1I3PXnjOpcpBVM3O3jsw-NRjWiHdZDFrv?usp=sharing @GeoffreyBlake Thank you for suggesting to do the profiling. I am also trying to figure out the differences in the resource consumption. — Manish Agrawal, Aug 05 '22 at 13:24
Interesting, I don't know what feed-service does, but it seems to go crazy on Graviton every so often with 5s (!) stalls. Is the CPU utilization 100% doing something, or is it blocked waiting instead? If the CPU util is 100%, I would think the culprit would show up in a cpu profile of hot functions. Django, postgres and everything else look like noise. — Geoffrey Blake, Aug 05 '22 at 20:05
The feed-service is my microservice. And in the above profiling, I hadn't even put any load on the server. The default health check(readiness) is being hit periodically by Kubernetes and its 5s stall is very strange. And this is only happening in the ARM-based processor. The CPU and Memory Usages are also not reaching 100%. Can it be possible that the uvicorn(asgi) is not working properly(or executing tasks properly) in ARM? — Manish Agrawal, Aug 06 '22 at 15:45
We are using uvicorn as the actual Apis needs asynchronous programming. We have one more application(which does not need asynchronous support) and we are running it on ARM based processor and it's running perfectly fine. — Manish Agrawal, Aug 06 '22 at 15:48
My main question would be is your install of uvicorn using the async programming facility differently because of Arm vs x86? Is there a fallback path being used? A 5 second response to a health check is really strange. Does this only happen with your service and K8s, or does a bare install of uvicorn on a VM on Arm see this? — Geoffrey Blake, Aug 08 '22 at 15:34
@GeoffreyBlake We haven't directly tested on a VM on ARM(without Kubernetes). I'd do it and also check the latency with the minimal install (with unicorn). — Manish Agrawal, Aug 10 '22 at 07:02

High Latency is being observed in AWS ARM Graviton Processor in Comparison to AMD Processor for ASGI based Django Application

0 Answers0