0

I followed : How to get google search results

and it works, but I would like to scrape the description of the first link that Google returns. For a CRAN keyword it is :

<span class="st"><em>CRAN</em> is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Please use the <em>CRAN</em>&nbsp;...</span>

but I don't know what is span section here, please provide solution without using RSelenium

G5W
  • 36,531
  • 10
  • 47
  • 80
Qbik
  • 5,885
  • 14
  • 62
  • 93

2 Answers2

1

Using rvest:

library(rvest)

baseUrl <- 'https://www.google.it/search?q='

query = 'cran'
url <- paste0(baseUrl, query)


read_html(url) %>% 
    html_nodes('.st') %>% 
    # This select only the first result, change number to select another reusult
    # or comment it to get all first page results
    '['(2) %>% 
    html_text()
GGamba
  • 13,140
  • 3
  • 38
  • 47
0

You can scrape from the Google Knowledge Graph (the summary box on the right side of your Google search result page).

You can use the Google Knowledge Graph API for this:

  1. Create an application in Google Developers Console

  2. Create authentication credentials

    knowlegdegraph<-function(query)
    {
       API_Key<-"Your_API_KEY"
       url<-paste("https://kgsearch.googleapis.com/v1/entities:search?query=",query, 
         "&key=", API_Key,
         "&limit=1&indent=True")
      jdata <- fromJSON(URLencode(url))
    
    } 
    

jdata is a list. You can extract the JSON element for the description with:

For a short description:

jdata[["itemListElement"]][["result"]][["description"]]

For a detailed description:

jdata[["itemListElement"]][["result"]][["detailedDescription"]][["articleBody"]]
Sherrypha
  • 11
  • 4