2

I am trying to do scraping from Google News with pygooglenews. I am trying to scrape more than 100 articles at a time (as google sets limit at 100) by changing the target dates using for loop. The below is what I have so far but I keep getting error message

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-4ada7169ebe7> in <module>
----> 1 df = pd.DataFrame(get_news('Banana'))
      2 writer = pd.ExcelWriter('My Result.xlsx', engine='xlsxwriter')
      3 df.to_excel(writer, sheet_name='Results', index=False)
      4 writer.save()

<ipython-input-79-c5266f97934d> in get_titles(search)
      9 
     10     for date in date_list[:-1]:
---> 11         search = gn.search(search, from_=date, to_=date_list[date_list.index(date)])
     12         newsitem = search['entries']
     13 

~\AppData\Roaming\Python\Python37\site-packages\pygooglenews\__init__.py in search(self, query, helper, when, from_, to_, proxies, scraping_bee)
    140         if from_ and not when:
    141             from_ = self.__from_to_helper(validate=from_)
--> 142             query += ' after:' + from_
    143 
    144         if to_ and not when:

TypeError: unsupported operand type(s) for +=: 'dict' and 'str'
import pandas as pd
from pygooglenews import GoogleNews
import datetime

gn = GoogleNews()

def get_news(search):
    stories = []
    start_date = datetime.date(2021,3,1)
    end_date = datetime.date(2021,3,5)
    delta = datetime.timedelta(days=1)
    date_list = pd.date_range(start_date, end_date).tolist()
    
    for date in date_list[:-1]:
        search = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
        newsitem = search['entries']

        for item in newsitem:
            story = {
                'title':item.title,
                'link':item.link,
                'published':item.published
            }
            stories.append(story)

    return stories

df = pd.DataFrame(get_news('Banana'))

Thank you in advance.

Paul P
  • 3,346
  • 2
  • 12
  • 26
will418
  • 21
  • 1
  • 3
  • Could you please clarify where this error is coming from, i.e., could you share the full traceback? Also, which `pygooglenews` library are you using exactly? – Paul P Mar 05 '21 at 12:20
  • Hi p3j4p5, thanks for your response. Apologies for lack of explanation. I've updated my post with error messages and also the link to the library that I am using. – will418 Mar 05 '21 at 22:13
  • Thanks @will418 - This helped, I've written an answer. – Paul P Mar 05 '21 at 23:01
  • 1
    Thank you, it worked perfectly! Is there anything I can do to support your work? – will418 Mar 24 '21 at 00:54
  • You are very welcome. If you could _accept_ and _upvote_ the answer, that would be much appreciated. – Paul P Mar 24 '21 at 09:31

2 Answers2

1

It looks like you are correctly passing in a string into get_news() which is then passed on as the first argument (search) into gn.search().

However, you're reassigning search to the result of gn.search() in the line:

  search = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
# ^^^^^^
# gets overwritten with the result of gn.search()

In the next iteration this reassigned search is passed into gn.search() which it doesn't like.

If you look at the code in pygooglenews, it looks like gn.search() is returning a dict which would explain the error.

To fix this, simply use a different variable, e.g.:

result = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
newsitem = result['entries']
Paul P
  • 3,346
  • 2
  • 12
  • 26
0

I know that pygooglenews has a limit of 100 articles, so you must to make a loop in which it will scrape every day separately.

Fazi Alnjd
  • 21
  • 5
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 30 '22 at 09:35