1

I have even tried the commands in pypi.org but no article is getting downloaded.

from newspaper import Article

url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article = Article(url)
article.download()
article.html

article.html only gives empty sting ' '. when i try article.parse() it gives error *

You must download() an article first!

I have tried workaround

while article.download_state == ArticleDownloadState.NOT_STARTED:
    # Raise exception if article download state does not change after 10 seconds
    if slept > 9:
        raise ArticleException('Download never started')
    sleep(1)
    slept += 1

still unable to solve the issue.

  • 1
    I was able to download and parse the article using the `newspaper` library for Python 3.6. If the HTML is coming up blank, there is some issue with the request. – Steven Aug 10 '18 at 21:45
  • 1
    its not working for me and i cannot find a solution – Udhai kumar Aug 11 '18 at 18:27

1 Answers1

0

Sometimes you have to clean up the link, e.g. from a RSS feed.

The urlparse python library can be used for Google Alerts.

Example

google_url = 'https://www.google.com/url?rct=j&sa=t&url=https://www.timesnownews.com/international/article/european-union-chief-donald-tusk-lashes-out-at-donald-trump-stance-on-europe/311933&ct=ga&cd=CAIyHDlhZGYyMmM4NzAwYzNlZDc6Y28udWs6ZW46R0I&usg=AFQjCNHrsEaxxjXvWB3wM_1aRjNg6aeZvw'

Get variable after url=

from urllib.parse import urlparse, parse_qs
url = urlparse(google_url)
print (parse_qs(url.query)['url'][0])

Moreover, also mind that the output is overwritten if different not separately assigned.

The output will only include article.text during testing your script:

article = Article('https://www.google.com/url?rct=j&sa=t&url=https://www.timesnownews.com/international/article/european-union-chief-donald-tusk-lashes-out-at-donald-trump-stance-on-europe/311933&ct=ga&cd=CAIyHDlhZGYyMmM4NzAwYzNlZDc6Y28udWs6ZW46R0I&usg=AFQjCNHrsEaxxjXvWB3wM_1aRjNg6aeZvw')
article.download()
article.parse()
article.top_image
article.text

This works during testing your script:

top_image = article.top_image
text = article.text
print (top_image, text)
Patrick
  • 407
  • 4
  • 7