How to reread the news on website using newspaper3k

Question

I'm trying to create a dataset to do sentiment analysis on news articles. I'm using Newspaper3k to scrape articles from the website. I scraped a few websites but didn't store the articles properly and hence I can't use them. When I try scraping the same websites again it only scrapes the new articles and not the ones it already scraped. Is there a way for me to scrape the articles I already scraped again??

score 1 · Answer 1 · answered Jun 21 '18 at 21:03

By default, newspaper caches all previously extracted articles and eliminates any article which it has already extracted.

This feature exists to prevent duplicate articles and to increase extraction speed.

You may opt out of this feature with the memoize_articles parameter.

For example, in your case set it to False:

newspaper.build('http://cbs.com', memoize_articles=False)

How to reread the news on website using newspaper3k

1 Answers1