I'm trying to scrape a number of webpages using newspaper3k
and my program is throwing 503 Exceptions. Can anyone help me identify the reason for this and help me get around it? To be exact, I'm not looking to catch these exceptions but to understand why they are occurring and prevent them if possible.
from newspaper import Article
dates = list()
titles = list()
urls = ['https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-29',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-02',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/fec-mps-hearing-may-21',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-05-06',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/fec-fsr-hearing-may-21',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-03-04',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2021/fec-2019-20-reserve-bank-annual-review',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-12-02',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-10-28',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-10-22',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-10-19',
'https://www.rbnz.govt.nz/research-and-publications/speeches/2020/speech2020-09-14']
for url in urls:
speech = Article(url)
speech.download()
speech.parse()
dates.append(speech.publish_date)
titles.append(speech.title)
Here's my Traceback:
---------------------------------------------------------------------------
ArticleException Traceback (most recent call last)
<ipython-input-5-217a6cafe26a> in <module>
20 speech = Article(url)
21 speech.download()
---> 22 speech.parse()
23 dates.append(speech.publish_date)
24 titles.append(speech.title)
/opt/anaconda3/lib/python3.8/site-packages/newspaper/article.py in parse(self)
189
190 def parse(self):
--> 191 self.throw_if_not_downloaded_verbose()
192
193 self.doc = self.config.get_parser().fromstring(self.html)
/opt/anaconda3/lib/python3.8/site-packages/newspaper/article.py in throw_if_not_downloaded_verbose(self)
529 raise ArticleException('You must `download()` an article first!')
530 elif self.download_state == ArticleDownloadState.FAILED_RESPONSE:
--> 531 raise ArticleException('Article `download()` failed with %s on URL %s' %
532 (self.download_exception_msg, self.url))
533
ArticleException: Article `download()` failed with 503 Server Error: Service Temporarily Unavailable
for url: https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-29
on URL https://www.rbnz.govt.nz/research-and-publications/speeches/2021/speech2021-06-29