Questions tagged [python-newspaper]

Newspaper is a Python library which delivers Instapaper style article extraction.

Newspaper is a Python library which delivers Instapaper style article extraction. Newspaper is inspired by requests and powered by lxml.

Useful links

111 questions
2
votes
1 answer

ImportError: No module named '_sqlite3' error- Underscore relevance?

I'm using Python3.4, I recently upgraded from python 3.3.2. I'm following these instructions on how to install newspaper which is a python library/tool. https://github.com/codelucas/newspaper I'm getting errors after executing this command: curl…
treetop
  • 165
  • 1
  • 13
2
votes
1 answer

Python package (Newspaper) install error

Trying to install a package which failed with the error below. I googled and installed setuptools - still getting same error. Command: pip install newspaper Collecting nltk==2.0.5 (from newspaper) Using cached nltk-2.0.5.tar.gz Complete output…
Peter
  • 111
  • 2
  • 9
2
votes
0 answers

How to extract article in chinese

from newspaper import Article import pdb from unidecode import unidecode def get_article_newspaper(url): article = Article(url,en='zh') # Chinese article.download(); article.parse()# article.text if blank! print…
Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142
1
vote
2 answers

Python library newspaper is not returning the published date

I am using newspaper python library to extract some data from new stories. The problem is that I am not getting this data for some URLs. These URLs work fine. They all return 200. I am doing this for a very large dataset but this is one of the URLs…
Sam Hall
  • 35
  • 4
1
vote
1 answer

News article extract using requests,bs4 and newspaper packages. why doesn't links=soup.select(".r a") find anything?. This code was working earlier

Objective: I am trying to download the news article based on the keywords to perform sentiment analysis. This code was working a few months ago but now it returns a null value. I tried fixing the issue butlinks=soup.select(".r a") return null…
1
vote
0 answers

Traceback (most recent call last): in python

I am new in python technology, am getting an error while while running my application , don't know where am wrong , please try t fix my error, if you have any question please free feel to ask any time. newspaper.py This is my newspaper.py file…
Monu Patil
  • 345
  • 5
  • 18
1
vote
1 answer

Newspaper3k scrape several websites

I want to get articles from several websites. I tried this but I don't know what I have to do next lm_paper = newspaper.build('https://www.lemonde.fr/') parisien_paper = newspaper.build('https://www.leparisien.fr/') papers = [lm_paper,…
LJRB
  • 199
  • 2
  • 11
1
vote
1 answer

ArticleException error in web scraping news articles by python

I am trying to web scrape news articles by certain keywords. I use Python 3. However, I am not able to get all the articles from the newspaper. After scraping some articles as output in the csv file I get ArticleException error. Could anyone help me…
crackers
  • 327
  • 2
  • 12
1
vote
1 answer

Missing temp folder in elastic beanstack with newspaper library

Every once in a while, the temp folder on my deployment server seems to go missing. I am using Flask and Newspaper on AWS elastic beanstalk. I am using the Newspaper library to scrape meta tags from external urls. Error on the…
1
vote
0 answers

Newspaper3k Library - scraping behind paywalls

Is there a way to use the Newspaper3k library to scrape behind paywalls if you have a subscription? As we don't have direct access to the URL request method, I'm not sure how we can, for example, pass in a session cookie. Is there any way, maybe…
Gummy bears
  • 176
  • 6
1
vote
1 answer

Why isn't my Newspaper3k code working with Newsweek?

I'm working out of a Jupyter Notebook and having an issue with newspaper unable to pull down anything from newsweek. I can get it running on Goose, but I wanted to have a backup in case Goose ever failed. I have tried other websites like Fox, Yahoo,…
M4cJunk13
  • 419
  • 8
  • 22
1
vote
1 answer

limiting the URL output from newspaper

I'm using newspaper3 to extract URLs from news.google, but the problem is I keep getting all the URLs (I've disabled memoize because I need the full list). I would like to only print the top 5 links or 5 random links doesn't really matter. I've…
Oliver May
  • 23
  • 3
1
vote
1 answer

ModuleNotFoundError: No module named 'newspaper3k'

I'm attempting to install the newspaper module on python, but I keep getting an error saying that there is no such module. I've tried making sure my directory is set to the right place, and I've checked that the module is installed. PyCharm, which…
Alex F
  • 23
  • 4
1
vote
1 answer

Newspaper python cache issue, every call same output

I use this module: https://github.com/codelucas/newspaper to download bitcoin articles from https://news.bitcoin.com/. But when I try to get next articles from next page 'https://news.bitcoin.com/page/2/page' I get same output. Same for any other…
John Snow
  • 11
  • 2
1
vote
1 answer

Import error with Ubuntu script using newspaper module

I have a script that will run locally, but not on my Ubuntu server. Other scripts work fine on both platforms, but this specific one throws a import error when I attempt to run it on Ubuntu. root@ip-xxx-xx-xx-xxx:~# /usr/bin/python3.5…
JLB
  • 11
  • 2