Questions tagged [python-newspaper]

Newspaper is a Python library which delivers Instapaper style article extraction.

Newspaper is a Python library which delivers Instapaper style article extraction. Newspaper is inspired by requests and powered by lxml.

Useful links

111 questions
3
votes
1 answer

Error when importing newspaper module

I'm trying to use the newspaper package on python 2 and I keep getting the error cannot import name images error when I download it. I followed previous SO advice and created an image directory in /usr/local/lib/python2.7/site-packages/newspaper…
3
votes
1 answer

Python 3: How can I get news articles that contain a certain keyword

I'm trying to write a little web app that returns the sentiment of a news article involving a keyword. I used the TextBlob and Newspaper3K python 3 packages. I tried to make the url string for Newspaper3K the result of a search query on Google News…
Jack Pan
  • 1,261
  • 3
  • 11
  • 12
3
votes
0 answers

Python newspaper library issues and errors

I am using the Python newspaper library which is running on a Linux vps server: the first issue relates to the newspaper sites whose articles I'm trying to parse saying I'm using adblocker so they don't show any articles as they want me to…
Del
  • 131
  • 2
  • 13
3
votes
2 answers

ImportError when installing newspaper

I am pretty new to python and am trying to import newspaper for article extraction. Whenever I try to import the module I get ImportError: cannot import name images. Anyone come across this problem and found a solution?
sammy88888888
  • 458
  • 1
  • 5
  • 18
2
votes
1 answer

Web Scraping with Python and newspaper3k lib does not return data

I have installed Newspapper3k Lib on my Mac with sudo pip3 install Newspapper3k. Im using Python 3. I want to return data thats supported at Article object, and that is url, date, title, text, summarisation and keywords but I do not get any…
taga
  • 3,537
  • 13
  • 53
  • 119
2
votes
2 answers

Shortcomings of Newspaper3k: How to Scrape ONLY Article HTML? Python

Hello and thank you kindly for your help, I've been using Python and Newspaper3k to scrape websites, but I've noticed that some functions are ...well... not functional. In particular, I've only been able to scrape the article HTML of roughly 1/10 or…
2
votes
0 answers

Reducing memory usage by newspaper3k

I am trying to host a.newspaper spider to send news to my phone every day. However I noticed that deleting an Article object does not free the memory and this takes up around 200MB RAM per run. I am currently running the spider in a separate .py…
2
votes
2 answers

How to input a list of URLs saved in a .txt to a Python program?

I have a list of URLs saved in a .txt file and I would like to feed them, one at a time, to a variable named url to which I apply methods from the newspaper3k python library. The program extracts the URL content, authors of the article, a summary of…
2
votes
1 answer

Why the python module newspaper3k only return 0 articles for tencent, sina and wallstreetcn?

The newspaper3k library is amazing. I am addicted on it. May I ask, why the Source and build() only return 0 articles from most of the china financial news page? Any problem in my code? from newspaper import Article,…
2
votes
0 answers

Extract only first post content from URL that has multiple tumblr posts with PYTHON

I am trying to extract only actual content/text from given input URL using newspaper package in python3. I have succeded in doing so but one of my URL consists of multiple tumblr posts in the same page. In the below URL I want content of first post…
bunny sunny
  • 301
  • 6
  • 15
2
votes
1 answer

How to reread the news on website using newspaper3k

I'm trying to create a dataset to do sentiment analysis on news articles. I'm using Newspaper3k to scrape articles from the website. I scraped a few websites but didn't store the articles properly and hence I can't use them. When I try scraping the…
2
votes
0 answers

How can I use the Newspaper library for websites that need authentication?

How can I use the Newspaper library for websites that need authentication? I'm using the newspaper3k library in order to download the html of several articles from different news sites (which is so far working just fine). However, as I need the full…
Vee
  • 21
  • 3
2
votes
0 answers

python newspaper - cannot extract article if URL is not in english language

I am trying to get the content of a news article with python newspaper module. I can find the body of a news item with the following code. The code parses the feed URL in feed_url variable with feedparser and then tries to find the news body and…
Istiaque Ahmed
  • 6,072
  • 24
  • 75
  • 141
2
votes
0 answers

python, newspaper,unhashable type: 'tzutc' and writing to dataframe

I have a bunch of urls that I want to download the text and do some further analysis. I am a python newbie. I have two problems: (1) I have a really weird type error; and (2) the results are not being written to the data frame. My code is as…
tom
  • 315
  • 1
  • 3
  • 10
2
votes
0 answers

Newspaper module import problems on Beanstalk

Has anyone tried using newspaper3k python library on AWS Elastic Beanstalk Python 3.4? I'm getting a strange error, despite images.py existing in the newspaper directory. Traceback (most recent call last): File…