Questions tagged [python-newspaper]

Newspaper is a Python library which delivers Instapaper style article extraction.

Newspaper is a Python library which delivers Instapaper style article extraction. Newspaper is inspired by requests and powered by lxml.

Useful links

111 questions
1
vote
0 answers

Process a list of URLs with Newspaper3k (python3 lib) using threading that never ends

A script read a list of URLs, I pass that list in a Queue and then I process them with python-newspaper3k. I have a lot of various URLs, many of them aren't very popular websites. The problem is that the processing never ends. Sometimes it reached…
1
vote
2 answers

Parse HTML String from MySQL in Newspaper3k

I have a MySQL Table full of crawled news article HTML data. I would like to extract article texts with newspaper3k module which I have done many times before. The only difference now is that I am not extracting an URL and parse the result with…
hag o hi
  • 117
  • 1
  • 1
  • 9
1
vote
1 answer

Cannot download article using newspaper3k

I have even tried the commands in pypi.org but no article is getting downloaded. from newspaper import Article url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/' article =…
1
vote
0 answers

Newspaper3k syntax error or wrong python version?

I'm trying to use newspaper3k and I followed all the steps to install. Everything works locally. When I push to my Azure App Service, I receive the following error below. My python version on Azure is 3.6.4.4. Any suggestions? Traceback (most recent…
EK_AllDay
  • 1,095
  • 2
  • 15
  • 35
1
vote
1 answer

Use external file with Newspaper3k

I'm performing a number of scraping and summary tasks and have found that newspaper works perfectly for my (most of) my needs. I have a series of pdf files I also need to look at and perform similar tasks with. I can find other apps to open and…
Thatch
  • 157
  • 1
  • 2
  • 11
1
vote
1 answer

What articles does the newspaper package of Python return?

My basic question is how does the newspaper package in Python determine what urls/articles it returns? One would think it simply returns all of the article links contained on the url you provide it but it doesn't seem to work that way. As an…
r1234
  • 21
  • 2
1
vote
1 answer

Problems installing geograpy with Anaconda Prompt

I am trying to use the geograpy module through the Anaconda Prompt. When I run pip install geograpy I get this warning that terminates the installation newspaper3k is in my AppData/Local/Continuum/Anaconda3/Lib/site-packages folder after I…
1
vote
3 answers

issue with installing python newspaper package

I am installing Python newspaper library with the following command in a virtual environment: pip install newspaper I get the following error. It still exists after I tried a few solutions from StackOverflow but it didn't work. I had the same issue…
utengr
  • 3,225
  • 3
  • 29
  • 68
1
vote
1 answer

Remove special quotation marks and other characters

I am trying to download articles using Article from newspaper, and trying to tokenize the words using nltk word_tokenizer. The problem is, when I try to print the parsed article text, some of these articles have special quotation marks like “, ”, ’,…
1
vote
1 answer

Python: Newspaper Module - Downloading from multiple URLs

I hate to start a new post but I am trying to accomplish the exact thing described in this question: Python: Newspaper Module - Any way to pool getting articles straight from URLs? In attempting to implement the solution, though, I am getting the…
bengen343
  • 139
  • 1
  • 2
  • 8
1
vote
0 answers

Python - Newspaper Library - Why is it missing sizable portions of articles?

I'm using the newspaper library, V. 2.7 found here. When I download, parse, and print the text, it gives me a much smaller portion of the article than exists in reality. Why is this? Is there any way to fix this? Here is my code: from newspaper…
Afflatus
  • 2,302
  • 5
  • 25
  • 40
1
vote
2 answers

How to take output from iterating, store that in a dictionary

So I have this script (running Python 3.5) using Google API and Newspaper. It searches google for articles that have to do with sleep. And then using Newspaper, I iterate over those URLS. And all I'm asking Newspaper to do is return a list of…
user5813071
1
vote
1 answer

Downloading articles from multiple urls with newspaper

I've been trying to extract mulitple articles from a webpage (zeit online, german newspaper), for which I have a list of urls I want to download articles from, so I do not need to crawl the page for urls. The newspaper package for python does an…
1
vote
0 answers

News aggregator for sentiment analysis

I am writing a little news sentiment analysis app - in python. I want to prepare a database of news articles to train my classifier on, so I am wondering what is my best course of action for fetching news articles off of the web. I looked at…
WeaselFox
  • 7,220
  • 8
  • 44
  • 75
0
votes
0 answers

Python script fails to parse newspaper article while tried in a virtual machine

I've created a simple Python code for news summarization, which uses newspaper3k library on Python 3.10. I ran the script in my personal laptop and it works fine. I moved the libraries and script to a virtual machine in our organization and tried…