0

I've created a simple Python code for news summarization, which uses newspaper3k library on Python 3.10. I ran the script in my personal laptop and it works fine. I moved the libraries and script to a virtual machine in our organization and tried running it there (using Pycharm). However I get an error while using article.parse().

Here's the script;

import nltk

import newspaper

from textblob import TextBlob

from newspaper import Article

from newspaper import Config

url = "https://press.un.org/en/2023/sc15277.doc.htm"

config = Config()

config.request_timeout = 60

output = Article(url,config=config)

print(f'URL: {output.url}')

output.download()

output.parse()

output.nlp()

print(f'Summary: {output.summary}')

The error I get is;

URL: https://press.un.org/en/2023/sc15277.doc.htm

Traceback (most recent call last):

  File "C:\Users\----------\PycharmProjects\pythonProject\main.py", line 14, in <module>

    output.parse()

  File "C:\Users\-----------\PythonInterpreter\Lib\site-packages\newspaper\article.py", line 191, in parse

    self.throw_if_not_downloaded_verbose()

  File "C:\Users\-----------\PythonInterpreter\Lib\site-packages\newspaper\article.py", line 531, in throw_if_not_downloaded_verbose

    raise ArticleException('Article `download()` failed with %s on URL %s' %
newspaper.article.ArticleException: Article `download()` failed with HTTPSConnectionPool(host='press.un.org', port=443): Max retries exceeded with url: /en/2023/sc15277.doc.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1002)'))) on URL https://press.un.org/en/2023/sc15277.doc.htm

Process finished with exit code 1

I tried adding the website certificate in Pycharm, tried changing the proxy settings. But the error persists. The URL is accessible in the virtual machine. I also tested the connectivity to the URL in Pycharm and the connection was succesfull.

bad_coder
  • 11,289
  • 20
  • 44
  • 72
  • Sounds like the virtual machine has a bad certificate. – John Gordon May 13 '23 at 20:43
  • Do you know how can I fix that? I already have a valid client certificate on the server. – midhunsugathan May 13 '23 at 21:08
  • @midhunsugathan maybe is not the certificate, but, this part of the error: `Max retries exceeded with url: /en/2023/sc15277.doc.htm`. I runf this code on Google Colab - except this line: `output.nlp()` and I got no errors. Can you [edit] and describe what's your goal with this code? - what are the desired results? – Mauricio Arias Olave May 26 '23 at 19:06

0 Answers0