1

What is the easiest way to launch a celery beat and worker process in my django pod?

I'm migrating my Openshift v2 Django app to Openshift v3. I'm using Pro subscription. I'm really a noob on Openshift v3 and docker and containers and kubernetes. I have used this tutorial https://blog.openshift.com/migrating-django-applications-openshift-3/ to migrate my app (which works pretty well).

I'm now struggling on how to start celery. On Openshift 2 I just used an action hook post_start:

source $OPENSHIFT_HOMEDIR/python/virtenv/bin/activate

python $OPENSHIFT_REPO_DIR/wsgi/podpub/manage.py celery worker\
--pidfile="$OPENSHIFT_DATA_DIR/celery/run/%n.pid"\
--logfile="$OPENSHIFT_DATA_DIR/celery/log/%n.log"\

python $OPENSHIFT_REPO_DIR/wsgi/podpub/manage.py celery beat\
--pidfile="$OPENSHIFT_DATA_DIR/celery/run/celeryd.pid"\
--logfile="$OPENSHIFT_DATA_DIR/celery/log/celeryd.log" &
-c 1\
--autoreload &

It is a quite simple setup. It just uses the django database as a message broker. No rabbitMQ or something.

Would a openshift "job" be appropriated for that? Or better use powershift image (https://pypi.python.org/pypi/powershift-image) action commands? But I did not understand how to execute them.

here is the current deployment configuration for my only app "

apiVersion: v1
kind: DeploymentConfig
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2017-12-27T22:58:31Z
  generation: 67
  labels:
    app: django
  name: django
  namespace: myproject
  resourceVersion: "68466321"
  selfLink: /oapi/v1/namespaces/myproject/deploymentconfigs/django
  uid: 64600436-ab49-11e7-ab43-0601fd434256
spec:
  replicas: 1
  selector:
    app: django
    deploymentconfig: django
  strategy:
    activeDeadlineSeconds: 21600
    recreateParams:
      timeoutSeconds: 600
    resources: {}
    rollingParams:
      intervalSeconds: 1
      maxSurge: 25%
      maxUnavailable: 25%
      timeoutSeconds: 600
      updatePeriodSeconds: 1
    type: Recreate
  template:
    metadata:
      annotations:
    openshift.io/generated-by: OpenShiftNewApp
      creationTimestamp: null
      labels:
    app: django
    deploymentconfig: django
    spec:
      containers:
      - image: docker-registry.default.svc:5000/myproject/django@sha256:6a0caac773acc65daad2e6ac87695f9f01ae3c99faba14536e0ec2b65088c808
    imagePullPolicy: Always
    name: django
    ports:
    - containerPort: 8080
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /opt/app-root/src/data
      name: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: data
    persistentVolumeClaim:
      claimName: django-data
  test: false
  triggers:
  - type: ConfigChange
  - imageChangeParams:
      automatic: true
      containerNames:
      - django
      from:
    kind: ImageStreamTag
    name: django:latest
    namespace: myproject
      lastTriggeredImage: docker-registry.default.svc:5000/myproject/django@sha256:6a0caac773acc65daad2e6ac87695f9f01ae3c99faba14536e0ec2b65088c808
    type: ImageChange

I'm using mod_wsgi-express and this is my app.sh

ARGS="$ARGS --log-to-terminal"
ARGS="$ARGS --port 8080"
ARGS="$ARGS --url-alias /static wsgi/static"

exec mod_wsgi-express start-server $ARGS wsgi/application

Help is very appreciated. Thank you

user3620060
  • 131
  • 1
  • 14
  • Easiest is probably to have an environment variable in startup script which is checked to see whether run a worker or beat process. Then have two deployments, one for the worker, and then another which sets environment variable to run beat process instead. The deployments would be same, except for name of deployment and the environment variable. – Graham Dumpleton Jan 04 '18 at 21:04
  • thanks for your answer. But I think I still know too less about Openshift to really understand your answer. Are you suggesting that I add further containers to my deployment configuration? Or creating separate deployment configurations / apps for the celery worker and beat? If I have separate apps for celery, how would they be able to interact with my django app? I'm using a sqlite database on a persistent storage. I added my deployment config for clarity. – user3620060 Jan 04 '18 at 23:24
  • The use of the persistent volume with SQLite complicates things as I was expecting you were using separate database instance. Is there a reason you couldn't use a separate PostgreSQL database which both Django and Celery can then interact with? – Graham Dumpleton Jan 05 '18 at 01:15
  • BTW, what WSGI server are you using? If you are using mod_wsgi-express there is a feature of it that you could use to run and manage the Celery processes. That way can all be done in the one container. If you are using mod_wsgi-express, show me your current ``app.sh`` file which you use to start it and I can then suggest edits and what else to do. – Graham Dumpleton Jan 05 '18 at 01:16
  • actually there is no specific reason why I need to use sqlite. I just wanted to have quick and dirty migration of my openshift v2 application that was using sqlite. I'm willing to move to postgreSQL database later. I add my app.sh to the post – user3620060 Jan 05 '18 at 14:40
  • @GrahamDumpleton were you thinking about --service-script option in mod_wsgi-express? I tried with that one. See my answer below. If you have any further suggestions I'll be happy to read them. Anyway, Thank you very much for your help (once again :-) ). – user3620060 Jan 06 '18 at 01:20

1 Answers1

1

I have managed to get it working, though I'm not quite happy with it. I will move to a postgreSQL database very soon. Here is what I did:

wsgi_mod-express has an option called service-script which starts an additional process besides the actual app. So I updated my app.sh:

#!/bin/bash

ARGS=""

ARGS="$ARGS --log-to-terminal"
ARGS="$ARGS --port 8080"
ARGS="$ARGS --url-alias /static wsgi/static"
ARGS="$ARGS --service-script celery_starter scripts/startCelery.py"

exec mod_wsgi-express start-server $ARGS wsgi/application

mind the last ARGS=... line.

I created a python script that starts up my celery worker and beat. startCelery.py:

import subprocess

OPENSHIFT_REPO_DIR="/opt/app-root/src"

OPENSHIFT_DATA_DIR="/opt/app-root/src/data"

pathToManagePy=OPENSHIFT_REPO_DIR + "/wsgi/podpub"

worker_cmd = [
    "python",
    pathToManagePy + "/manage.py",
    "celery",
    "worker",
    "--pidfile="+OPENSHIFT_REPO_DIR+"/%n.pid",
    "--logfile="+OPENSHIFT_DATA_DIR+"/celery/log/%n.log",
    "-c 1",
    "--autoreload"
    ]
print(worker_cmd)


subprocess.Popen(worker_cmd, close_fds=True)

beat_cmd = [
    "python",
    pathToManagePy + "/manage.py",
    "celery",
    "beat",
    "--pidfile="+OPENSHIFT_REPO_DIR+"/celeryd.pid",
    "--logfile="+OPENSHIFT_DATA_DIR+"/celery/log/celeryd.log",
    ]
print(beat_cmd)

subprocess.Popen(beat_cmd)

this was actually working, but I kept receiving a message when I tried to launch the celery worker saying "Running a worker with superuser privileges when the worker accepts messages serialized with pickle is a very bad idea! If you really want to continue then you have to set the C_FORCE_ROOT environment variable (but please think about this before you do)."

Eventhough I added these configurations to my settings.py in order to remove pickle serializer, it kept giving me that same error message.

CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACEEPT_CONTENT = ['json']

I don't know why. At the end I added C_FORCE_ROOT to my .s2i/enviroment

C_FORCE_ROOT=true

Now it's working, at least I thinks so. My next job will only run in some hours. I'm still open for any further suggestions and tipps.

user3620060
  • 131
  • 1
  • 14
  • The service script was indeed what was going to recommend. I would though use a separate script for starting Celery main daemon and beat process. I would also use ``os.execl()`` so that the Celery process replaces the process, not run it as a sub process. This is necessary to ensure it receives signals properly. – Graham Dumpleton Jan 06 '18 at 02:12
  • I don't understand the ``root`` warning because if you are using S2I builder, it doesn't run as ``root``. It might though have an over zealous check and also be checking if running as group ``root``. Running as group ``root`` is perfectly okay, so it shouldn't be complaining. – Graham Dumpleton Jan 06 '18 at 02:13
  • It is a flaw in Celery as far as I am concerned. They have https://github.com/celery/celery/blob/a3c377474ab1109a26de5169066a4fae0d30524b/celery/platforms.py#L785 so it does check for group ``root`` when there isn't really any reason that is bad. – Graham Dumpleton Jan 06 '18 at 02:17
  • Because of that strange restriction in Celery, I don't see an issue with setting ``C_FORCE_ROOT=true``. – Graham Dumpleton Jan 06 '18 at 02:28