2

I am trying to extract only actual content/text from given input URL using newspaper package in python3. I have succeded in doing so but one of my URL consists of multiple tumblr posts in the same page.

In the below URL I want content of first post only i.e., paragraph starting with "The Karnataka Assembly election 2018 result is close to being known as vote counting is underway on Tuesday, "

https://poonamparekh.tumblr.com/post/173920050130/karnataka-election-results-modi-rallies-set-to

In my working while extracting content from above URL instead of first post I am getting 6th post content as my output. But that's not what I need. I require first post to be as my output. Can anyone help me out in achieving this ?

Here is my code:

from newspaper import Article

url="https://poonamparekh.tumblr.com/post/173920050130/karnataka-election-results-modi-rallies-set-to"
print(url)
article = Article(url, language='en')
article.download()
article.download_state
print('articlee_state : ',article.download_state)

if article.download_state == 2:
  try:
    article.parse()
    result=article.text[0]
    print(result[:150])
    if result=='':
      print('----MESSAGE : No description written for this post')
   except Exception as e:
    print(e)
bunny sunny
  • 301
  • 6
  • 15

0 Answers0