0

I want to compare News Article from different countries for the usage of a specific keyword.

My idea is to scrape Google News using RCrawler:

RCrawler(website = “https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmtaU2dBUAE?hl=de&gl=DE&ceid=DE%3Ade”, MaxDepth = 5, Keywordfilter = c(“Keyword”), KeywordAccuracy = 99)

And then just counting the results that I’m getting back. Im not sure if this is the best method or if its even correct but I’m new to R and its the best method i can currently think of.

schneebii
  • 1
  • 1
  • 1
    Welcome to Stackoverflow! I've shared an answer to your query below. Please note that these Q/As act as future reference for users other than you, so your title and post details should reflect such a responsibility. I would suggest changing your title to "Scraping Google News with Rvest" or something of the sort because the current one does not describe the problem. – Aman Jan 01 '21 at 13:50

1 Answers1

2

Since you're using Google News, instead of scraping this way, an easier method would be to access the RSS feed for that particular keyword and pull that into a dataframe. Luckily, there is the {tidyRSS} package that you can use to do just this.

An example of what a feed looks like is with this URL:

https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en

Learn how to customize this URL here. You can search by geolocation if you wish.

After you install tidyRSS, you can implement it like so:

library(tidyRSS)

# I will search for the keyword Apple

keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en"
# From the package vignette

google_news <- tidyfeed(
  keyword,
  clean_tags = TRUE,
  parse_dates = TRUE
)

This gives you a dataframe with many variables that describe each article. You can choose which ones to keep.

Aman
  • 387
  • 8
  • 33