0

I wrote a mini-app, that scrapes my school's Website then looks for the title of the last post, compare it to the old title, if it's not the same, it then sends me an email. In order for the app to work properly it needs to keep running 24/7 so that the value of the title variable is correct. Here's the code:

import requests
from bs4 import BeautifulSoup
import schedule, time
import sys
import smtplib


#Mailing Info

from_addr = ''
to_addrs = ['']

message = """From: sender
To: receiver
Subject: New Post

A new post has been published
visit the website to view it: 
"""


def send_mail(msg):
    try:
        s = smtplib.SMTP('localhost')
        s.login('email',
         'password')
         
        s.sendmail(from_addr, to_addrs, msg)
        s.quit()
    except smtplib.SMTPException as e:
        print(e)


#Scraping
URL = ''

title = 'Hello World'


def check():
    global title
    global message

    page = requests.get(URL)
    soup = BeautifulSoup(page.content, 'html.parser')

    main_section = soup.find('section', id='spacious_featured_posts_widget-2')
    first_div = main_section.find('div', class_='tg-one-half')

    current_title = first_div.find('h2', class_='entry-title').find('a')['title']

    if current_title != title:
        send_mail(message)
        title = current_title
    else:
        send_mail("Nothing New")


schedule.every(6).hours.do(check)

while True:
    schedule.run_pending()
    time.sleep(0.000001)

So my question is How do I keep this code running on host using Cpanel? I know I can use cron jobs to run it every like 2 hours or something, but I don't know how to keep the script itself running, using a terminal doesn't work when I close the page the app gets terminated

anas bouabid
  • 1
  • 1
  • 6
  • 1
    You should explain why you need help with this; what's stopping it from running 24/7 as-is? It looks like it should already be doing that. Is it crashing? Is the host killing the process? Does the host occasionally reboot? All of those could have different solutions, and you might need multiple solutions depending on what the issue is. – Random Davis Apr 07 '21 at 17:43
  • This may be more of an OS or Cpanel question. If you are running a Linux server under Cpanel, then one way to get a python process to run forever is: nohup python yourcode.py > logfile & nohup keeps the process running when you disconnect, and the & launches it in the background so you can log off. Are you running a server on Cpanel?? – labroid Apr 07 '21 at 19:14
  • You could use OS provided scheduler e.g., run your script via cron. – jfs Apr 07 '21 at 19:32

1 Answers1

1

So - generally to run programs for an extended period, they would need to be daemonised. Essentially disconnected from your terminal with a double-fork, and a set-sid. Having that said, I've never actually done it myself, since it was usually either (a) the wrong solution, or (b) it's re-inventing the wheel (https://github.com/thesharp/daemonize).

In this case, I think a better course of action would be to invoke the script every 6 hours, rather than have it internally do something every 6 hours. Making your program resilient to a restart is pretty much how most systems are kept reliable, and putting them in a 'cradle' that automatically restarts them.

In your case, I'd suggest saving the title to a file, and reading from and writing to that file when the script is invoked. It would make your script simplier, and more robust, and you'd be using battle-hardened tools for the job.

A couple of years down the line, when your writing code that needs to survive the total machine crashing, and being replaced (within 6 hours, with everything installed) you can use some external form of storage (like a database) instead of a file, to make your system even more resiliant.

Rory Browne
  • 627
  • 1
  • 5
  • 11
  • Thank you, It makes more sense this way! – anas bouabid Apr 07 '21 at 19:28
  • 1
    cron would be the preferred solution in this case. In general, one could delegate the service part to systemd (no need to deamonize inside your script) – jfs Apr 07 '21 at 19:35
  • Agreed; cron is the right battle hardened tool for scheduling if it's available. I mentioned daemonize, since even though I disagreed with the approach, the question did specifically ask about running the script continuously, and I wasn't sure if systemd would be available. Neither systemd nor daemonize would be required for the approach I recommended. – Rory Browne Apr 07 '21 at 21:48