How to scrape description of first link from Google?

Question

I followed : How to get google search results

and it works, but I would like to scrape the description of the first link that Google returns. For a CRAN keyword it is :

<span class="st"><em>CRAN</em> is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Please use the <em>CRAN</em>&nbsp;...</span>

but I don't know what is span section here, please provide solution without using RSelenium

score 1 · Answer 1 · answered Mar 17 '17 at 13:41

Using rvest:

library(rvest)

baseUrl <- 'https://www.google.it/search?q='

query = 'cran'
url <- paste0(baseUrl, query)


read_html(url) %>% 
    html_nodes('.st') %>% 
    # This select only the first result, change number to select another reusult
    # or comment it to get all first page results
    '['(2) %>% 
    html_text()

score 0 · Answer 2 · edited Dec 14 '21 at 11:15

You can scrape from the Google Knowledge Graph (the summary box on the right side of your Google search result page).

You can use the Google Knowledge Graph API for this:

Create an application in Google Developers Console

Create authentication credentials

knowlegdegraph<-function(query)
{
   API_Key<-"Your_API_KEY"
   url<-paste("https://kgsearch.googleapis.com/v1/entities:search?query=",query, 
     "&key=", API_Key,
     "&limit=1&indent=True")
  jdata <- fromJSON(URLencode(url))

}

jdata is a list. You can extract the JSON element for the description with:

For a short description:

jdata[["itemListElement"]][["result"]][["description"]]

For a detailed description:

jdata[["itemListElement"]][["result"]][["detailedDescription"]][["articleBody"]]

How to scrape description of first link from Google?

2 Answers2

Linked