1

I am currently working with HDFS, Apache Livy and Django, the goal of this is to send a request to get some code running which is stored in HDFS and which calls Livy to create Batches. For now, everything is working, I have a basic wordcount stored in HDFS, with a .txt file, and on a htlm page I just have a simple button to click on to launch the whole process.

I succeed in creating the wordcount result, and my next step is to get informations from Livy, for instance the ID of the sessions (or batches) currently starting/running/dead/success some sort of callback, but I need the it to self actualize so I can know what states are every sessions in. To do so, I thought I could use Django-cron, therefore I can't manage to set it correctly. I have no errors but nothing more is happening. What am I missing ?

Currently working on Centos7 but I'm using a Conda environment in Python 3.6, with Django latest release, so are livy and HDFS (latest release)

Here are my current files :

livy.html

{% load static %}

<html>
<body>
<div id="div1">

{{result.sessions}}

</div>

<form action="#" method="get">
 <input type="text" name="mytextbox" />
 <input type="submit" class="btn" value="Click" name="mybtn">
</form>

</body>
</html>

views.py

from django.shortcuts import render
from django.http import HttpResponse
from django_cron import CronJobBase, Schedule
import wordcount, livy

# Create your views here.

class CheckIdCronJob(CronJobBase):
    RUN_EVERY_MINS = 1 # every minute

    schedule = Schedule(run_every_mins=RUN_EVERY_MINS)
    code = 'button.CheckIdCronJob'    # a unique code

    def index(request):
        if(request.GET.get('mybtn')):
            r = livy.send(request.GET.get('mytextbox')) #(/test/LICENSE.txt)
            return render(request,'button/livy.html', {'result':r})
        return render(request,'button/livy.html')

livy.py

import json, pprint, requests, textwrap

def send(inputText):
    host = 'http://localhost:8998'
    data = {"file":"/myapp/wordcount.py", "args":[inputText,"2"]}
    headers = {'Content-Type': 'application/json'}
    r = requests.post(host + '/batches', data=json.dumps(data), headers=headers)
    r = requests.get(host + '/batches' + '', data=json.dumps(data), headers=headers)
    return r.json()
halfer
  • 19,824
  • 17
  • 99
  • 186
Bromania
  • 15
  • 7
  • And you've followed all the instructions from the [docs](https://django-cron.readthedocs.io/en/latest/installation.html)? Including running a crontab that will kick-off the process every x minutes? (step 6) – dirkgroten Aug 08 '19 at 14:15
  • Thanks for your answer. I did all of this already, I don't think I've forgotten anything. I set everything to refresh every minute just to see it that was working, but I didn't see any output on my shell, nor any changes in my code (I still have to refresh manually to make it work) – Bromania Aug 08 '19 at 14:46
  • show us the output of `crontab -l` and look into your log file (cronjob.log or whatever you configured when creating the crontab.) – dirkgroten Aug 08 '19 at 14:50
  • It's saying "No crontab for root"... My guess is something definitely wrong xD And I can't find any log... I think I didn't create any crontab... But I made the whole installation from [installation](https://django-cron.readthedocs.io/en/latest/installation.html) and there is no crontab specified... Sorry I'm beginner thanks for your time... – Bromania Aug 09 '19 at 06:52
  • Read up on how to create a crontab with `crontab -e`. The installation guide shows an example at the end of step 6 but you should understand how it works in order to debug the issue. Maybe the user you used to create the crontab isn’t root (in general you shouldn’t run as root). But in any case the issue is your crontab file so you just need to fix that. – dirkgroten Aug 09 '19 at 07:45
  • Okay so, I started reading the documentation and I experimented a bit with crontab, it seems to be working now. I mean, it's not doing what I want yet, but it's doing something at least !!! Thanks for you help. PS : Am i supposed to close the subject or to validate an answer ? It's my first post on StackOverflow :) – Bromania Aug 09 '19 at 09:32
  • let me post an answer that you can accept. – dirkgroten Aug 09 '19 at 09:33

1 Answers1

0

What django-crontab does is just make it easy to write management commands that run a job and specify how often/when these jobs should run. You end up with one management command ./manage.py runcron that will check all your jobs and run them if needed.

What it doesn't do is continuously runcron, which is what you actually need if you want to make sure your jobs run at the right moment. Basically, you want runcron to run every minute (or if the time is not that critical every 10 minutes) for example, so you still need to use some system daemon that will do that.

crontab is available on CentOS and can be used for just that purpose. The installation of django-crontab shows you an example of how to create a crontab that will run runcron every 5 minutes:

crontab -e
*/5 * * * * source /home/ubuntu/.bashrc && source /home/ubuntu/work/your-project/bin/activate && python /home/ubuntu/work/your-project/src/manage.py runcrons > /home/ubuntu/cronjob.log

You have to adapt that to fit your use case:

  • If you just do crontab -e ... the job will run as the user you're currently logged in as. That might not be the right user to run the manage.py command, since that user needs to have the correct permissions to run your project. Use -u user to make the crontab for a different user.

    This is actually the complicated thing when running in production: Getting user permissions correct and getting the right user to run the various tasks. Normally you'd have a www-data or apache user that's running your server (and hence django app) and you want that same user to run the manage.py command. It should not be root running apache as that opens up security risks (your web server would have full access to the entire system).

  • The above command sources .bashrc to make sure the environment variables are set correctly. /home/ubuntu/ is just the user home directory for the user ubuntu. Change this appropriately.
  • The above command also activates the virtualenv so that the manage.py command can run with all the correct dependencies. Adapt the path to your virtualenv.
  • Finally you need to make sure the correct Django settings are activated, either by having DJANGO_SETTINGS_MODULE environment variable set (which you can do in .bashrc hence the source earlier) or by passing the --settings path.to.settings option to manage.py.
  • The last part is directing the output of the task to a log file, so you can troubleshoot if there are issues. Please also add 2>&1 at the end so that cron errors (stderr) are also directed to that same log.

To check your crontab, run crontab -l (for the currently logged in user) or crontab -l -u user for a different user.

dirkgroten
  • 20,112
  • 2
  • 29
  • 42