3

I was trying to create a website in Django, which basically scrapes the data from google news and puts it on my website. But I didn't know how to use the data that I extracted from google news in my Django HTML file. Is there a way that I could do that.

Also, It slows the website very much, so is this the best way to do it?

The web scraping code:

from bs4 import BeautifulSoup
import requests
url = "https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en"
headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
n = 1
for link in soup.findAll('h3', {'class', 'ipQwMb ekueJc RD0gLb'}):
    title = link.string
    for a in link.findAll('a', {'class', 'DY5T1d'}): 
        href = a.get('href')
        link_href = href.replace(".", "")
        print("(" + str(n) + ")" + title + "\n" + "https://news.google.com" + link_href)
        n += 1
Heisenberg
  • 475
  • 8
  • 24
  • 1
    The best way would be to add some background tasks (cron jobs) for fetching the data from google news add that data into your DB. And then fetch the data from DB so that could be much faster than you current implement. There are a couple of options for adding background tasks like- celery, rq etc – Vishvajit Pathak Sep 01 '19 at 11:58
  • I kinda get what you're talking about but still doesn't know how to implement it, I hope you explain more maybe. And you see, Google News updates like every other hour, so it would be storing a lot of information in the database, is that a good idea? – Heisenberg Sep 01 '19 at 13:35

1 Answers1

0

Even if this post is old right now my answer might help others along their way ;) You have to implement threading to avoid the slow down of the page, while the scraping process (or any process that takes time). Means one task should always get a new thread. Find multiple threading on YouTube and google there are a lot of tutorials, even specifically for Django. Best of luck and enjoy coding :)

  • 1
    Provide coding example base on OPs code to improve quality. Answers with code and an explanation are usually more helpful and of better quality, and are more likely to attract upvotes. – ZF007 Oct 22 '20 at 10:38