I've created a simple Python code for news summarization, which uses newspaper3k library on Python 3.10. I ran the script in my personal laptop and it works fine. I moved the libraries and script to a virtual machine in our organization and tried running it there (using Pycharm). However I get an error while using article.parse().
Here's the script;
import nltk
import newspaper
from textblob import TextBlob
from newspaper import Article
from newspaper import Config
url = "https://press.un.org/en/2023/sc15277.doc.htm"
config = Config()
config.request_timeout = 60
output = Article(url,config=config)
print(f'URL: {output.url}')
output.download()
output.parse()
output.nlp()
print(f'Summary: {output.summary}')
The error I get is;
URL: https://press.un.org/en/2023/sc15277.doc.htm
Traceback (most recent call last):
File "C:\Users\----------\PycharmProjects\pythonProject\main.py", line 14, in <module>
output.parse()
File "C:\Users\-----------\PythonInterpreter\Lib\site-packages\newspaper\article.py", line 191, in parse
self.throw_if_not_downloaded_verbose()
File "C:\Users\-----------\PythonInterpreter\Lib\site-packages\newspaper\article.py", line 531, in throw_if_not_downloaded_verbose
raise ArticleException('Article `download()` failed with %s on URL %s' %
newspaper.article.ArticleException: Article `download()` failed with HTTPSConnectionPool(host='press.un.org', port=443): Max retries exceeded with url: /en/2023/sc15277.doc.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1002)'))) on URL https://press.un.org/en/2023/sc15277.doc.htm
Process finished with exit code 1
I tried adding the website certificate in Pycharm, tried changing the proxy settings. But the error persists. The URL is accessible in the virtual machine. I also tested the connectivity to the URL in Pycharm and the connection was succesfull.