0

I'm working on a project and would like to scrape article headlines and post date on certain topics from CNN. I did some scraping works before (extract some tables from Wiki) but I failed to extract the information I want in this time. Here is my code:

import requests
from bs4 import BeautifulSoup

link = 'https://www.cnn.com/search?q=tesla&size=10&category=us'
cnn = requests.get(link)
soup = BeautifulSoup(cnn)
soup.find_all(class_="cnn-search__result-headline")

I got nothing from this code. I tried to figure out this problem but did not get the solution, and it bothers me for two days. Many thanks if anyone could help me to solve this problem.

Joe
  • 1
  • 1
  • 1
    Does it need to be with Beautiful soup? I find Selenium with python gives great results and its quite simple too. – libby Sep 11 '20 at 14:22
  • you need to parse cnn.content and not cnn – AdForte Sep 11 '20 at 14:23
  • 4
    It seems that CNN loads the headlines and other data via javascript. BeautifulSoup does not do javascript. I'd use Selenium instead. –  Sep 11 '20 at 14:27
  • Got it. I've never learned javascript before so I have no idea what going on here. I will try Selenium, thanks for your helps!!! – Joe Sep 11 '20 at 14:31
  • Does this answer your question? [CNN Scraper sporadically working in python](https://stackoverflow.com/questions/61146746/cnn-scraper-sporadically-working-in-python) – αԋɱҽԃ αмєяιcαη Sep 11 '20 at 22:34

1 Answers1

0

This is how to use bs4, the example I use is to get the latest news from CNBC.

import requests
from bs4 import BeautifulSoup

link = 'https://www.cnbc.com/world/?region=world'
page = requests.get(link)
soup=BeautifulSoup(page.text,'html.parser')

headline_news = soup.find("a", attrs={'class': 'LatestNews-headline'}).text.strip()
link_news = soup.find('a', attrs={'class': 'LatestNews-headline'})['href'].strip()
time_news = soup.find("span", attrs={'class': 'LatestNews-wrapper'}).text.strip()
Eric Aya
  • 69,473
  • 35
  • 181
  • 253