I wrote a code to scrape datas from a site but there is a problem. The site it is a news portal.
articleIndex = 0
for div in mainPage_soup.findAll('div', attrs={'class':'title'}):
if(articleIndex<2):
article = requests.get(article_url)
article_soup = BeautifulSoup(article.content, "html.parser")
d=""
date_soup = BeautifulSoup(html)
d=date_soup.find('time', class_='article-datetime').get_text()
print(d)
article_content_str = ""
text = article_soup.find('div', class_='article-content entry-content')
for item in text.find_all('p'):
text = "#" + item.text
article_content_str += text
The site name: hvg.hu
I get a nontype error with date and p-s.
The Date is the article realase date
And the P get the article text by sentences.
I tried a lot about the date. normal text, get_text but nothing work.
It works (if I write out the class names) with a different sites.
I don't know where is the problem.
Maybe I chose wrong divs?