Questions tagged [newspaper3k]
49 questions
1
vote
1 answer
How do I remove unwanted classes and tags from newspaper3k object?
I want to extract news article contents and I'm currently using newspaper3k library:
a = Article(url, memoize_articles=False, language='en')
a.download()
a.parse()
content = a.text
But for some websites, there are unwanted elements like…

jason
- 27
- 1
- 8
0
votes
0 answers
Python script fails to parse newspaper article while tried in a virtual machine
I've created a simple Python code for news summarization, which uses newspaper3k library on Python 3.10. I ran the script in my personal laptop and it works fine. I moved the libraries and script to a virtual machine in our organization and tried…

midhunsugathan
- 11
- 2
0
votes
1 answer
Using pyinstaller to create an executable program newspaper3k
I am working on a news crawling program and want to take my python file and turn it into an executable application. But I am having a lot of trouble with the newspaper3k library. My program works fine on PyCharm, but when I try to run the executable…

user21822637
- 1
- 1
0
votes
1 answer
Why is.summary on the Python newspaper3k module returning blank?
I'm presently coding a quick python script to summarize a given news article using the newspaper3k module
The following code to retrieve and print the text in the terminal works fine.
import newspaper
# Assign url
url = 'url'
# Extract web…

Kudu2
- 3
- 1
0
votes
0 answers
Github Actions not accessing download from Newspaper3k
I've been trying to use Github Actions to run a python script. Everything seems to run fine, except a specific function that uses the Newspaper3k package. The article appears to download fine (article.html works ok), but Article.parse() does not…

Dave C
- 367
- 5
- 19
0
votes
0 answers
Python Newspaper3k code suddenly not working
so I have an Excel sheet containing different links to various online news articles. Utilizing Newspaper3k, I created this for loop that would go through all of the articles in the column containing the links and web scrape them, receiving insights…
0
votes
1 answer
Can't find publish_date with newspaper3k
I want to scrape an article from a website with the newspaper library (newspaper3k). However, it doesn't find the published_date for the article, which is div.source-date in the website's source text, and the authors (or source rather), which is…

Linda Brck
- 71
- 6
0
votes
1 answer
I want to scrape all the text like heading, bullets paragraph from article acept some tags from start of the article and from end of the article
I want to scrape the Article for this site
https://www.traveloffpath.com/covid-19-travel-insurance-everything-you-need-to-know/
and https://www.traveloffpath.com/what-to-do-if-your-flight-is-delayed-or-canceled/?swcfpc=1
I am stuck in the "p" tag…

Info Rewind
- 145
- 7
0
votes
0 answers
_tkinter.TclError displays on some news articles
Currently, I am writing a program that allows the user to input a link from a news site and then my program will display the title, author, and the summary of the inputted news article. I am currently using the newspaper module.
However, I realized…

myts999
- 45
- 9
0
votes
0 answers
fetching thousands of urls with Newspaper3k and Multiprocessing slows down after few hundred calls
I have a code which is meant to:
a) call an API to get Google SERP results;
b) open each retrieved url with the newspaper3k python3 library, which extracts the text of the news article;
c) save the text of the article into a .txt file.
The…

Lorenzo Romani
- 31
- 4
0
votes
1 answer
No module named 'newspaper'
I have installed "newspaper3k" both on the command line and onside the jupyter notebook. Both clearly say the package is installed. But when I sue import, it says the No Module named "newspaper".
It works on colab but not my local kernel (win 10,…

Gary Li
- 1
0
votes
1 answer
Newspaper3k filter out bad URL while extracting
With some help ;) I have managed to scrape titles and content from CNN news website and put this in a .csv file.
Now the list with URLs (which has been extracted with another code) has some bad URLs. The code for this is really simple as it just…

Robbie Voort
- 121
- 6
0
votes
1 answer
News scraping multiple url inside a dataframe
So I am try using Newspaper3k for scraping content of a few website.In the library the function Article() only take a single url.Is this possible to iterate a dataframe a full of url to scrape it automated?My df is like this
df =…

ddinfiwis
- 41
- 6
0
votes
1 answer
How to get around Newspaper throwing 503 exceptions for certain webpages
I'm trying to scrape a number of webpages using newspaper3k and my program is throwing 503 Exceptions. Can anyone help me identify the reason for this and help me get around it? To be exact, I'm not looking to catch these exceptions but to…

Christian Adib
- 111
- 8
0
votes
1 answer
Google Search Crawler and Newspaper3k libraries have been combined inside a loop to create automated scraper. But code doesn't work.. Solution?
In the code below I am scraping google search links with the help of Newpaper3k. However, the code fails whenever it comes across a link that is not scrapeable or otherwise. How to skip the website which cannot be scraped and mine the results for…

Utkarsh Singh
- 11
- 2