0

This document says that that by default, newspaper caches all previously extracted articles and eliminates any article which it has already extracted.

>>> cbs_paper = newspaper.build('http://cbs.com')
>>> cbs_paper.size()
1030

>>> cbs_paper = newspaper.build('http://cbs.com')
>>> cbs_paper.size()
2

Okay, but it says nothing if I build once a website how I can retrieve the cashed articles?

Ahmad
  • 8,811
  • 11
  • 76
  • 141

1 Answers1

2

newspaper3k uses memoize to cache the articles for a source

setting memoize to false would stop the caching mechanism

cbs_paper = newspaper.build('http://cbs.com', memoize_articles=False)

however, if you still want the caching and want to access the cached articles you can find .newspaper_scraper dir inside the temp folder (Path in windows machine)

C:\Users\your_user\AppData\Local\Temp\.newspaper_scraper\memoized

For Linux-based OSes, try looking in

/tmp/.newspaper_scraper/memoized/

For macOS, look in the directory specified by $TMPDIR. This may be any or none of the following:

/tmp/.newspaper_scraper/memoized/
/private/tmp/.newspaper_scraper/memoized/
~/Library/Caches/TemporaryItems/.newspaper_scraper/memoized/
samueldy
  • 75
  • 6
Akash Sahu
  • 21
  • 3