8

Rvest select option, I think it is easiest to explain with an example reproducible

Website: http://www.verema.com/vinos/portada I want to get the types of wines (Tipos de vinos), in html code is:

<select class="campo select" id="producto_tipo_producto_id" name="producto[tipo_producto_id]">
<option value="">Todos</option>
<option value="211">Tinto</option>
<option value="213">Blanco</option>
<option value="215">Rosado</option>
<option value="216">Espumoso</option>
<option value="217">Dulces y Generosos</option></select>

XPath :  //*[@id="producto_tipo_producto_id"]  or
CSS : #producto_tipo_producto_id  or
Class: campo select

I want a data.frame as

211 Tinto

213 Blanco

215 Rosado

216 Espumoso

217 Dulces y Generosos

My code (R):

library(rvest)

Pagina.R <- html(x = "http://www.verema.com/vinos/portada")

text <- Pagina.R %>% 
  html_nodes(xpath='//*[@id="producto_tipo_producto_id"]')%>%
  html_text() 
text

values <- Pagina.R %>% 
  html_nodes(xpath='//*[@id="producto_tipo_producto_id"]')%>%
  html_attr("option value")       #problem????
values

Res <- data.frame(text = text, values = values, stringsAsFactors = FALSE)
Res  # problem  

Suggestions?

Thank you.


Update

Revised, functioning code:

library(rvest)

Pagina.R <- html(x = "http://www.verema.com/vinos/portada")

text <- Pagina.R %>% 
#  html_nodes(xpath='//*[@id="producto_tipo_producto_id"]')%>%
  html_nodes(xpath='//*[@id="producto_tipo_producto_id"]/option')%>%
  html_text() 
text

values <- Pagina.R %>% 
#  html_nodes(xpath='//*[@id="producto_tipo_producto_id"]')%>%
  html_nodes(xpath='//*[@id="producto_tipo_producto_id"]/option')%>%
   # html_attr("option value") 
  html_attr("value") 
values

Res <- data.frame(text = text, values = values, stringsAsFactors = FALSE)
Res 
Dan Solovay
  • 3,134
  • 3
  • 26
  • 55
Javier Marcuzzi
  • 113
  • 2
  • 7

1 Answers1

0

Here is another approach that can be considered :

library(RSelenium)

shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
url <- "http://www.verema.com/vinos/portada"
remDr$open()
remDr$navigate(url)
remDr$screenshot(TRUE)

web_Obj_Accept <- remDr$findElement("id", "cookies-todas")
web_Obj_Accept$clickElement()

list_Web_Obj_Product_Of_Vine <- remDr$findElements("class name", "vrm-Search_Filter-vinos")
list_Type_Wine <- lapply(X = list_Web_Obj_Product_Of_Vine, FUN = function(x) x$getElementText()[[1]])

nb_Type_Wine <- length(list_Type_Wine)
list_Values <- list()

for(i in 1 : nb_Type_Wine)
{
  id <- tolower(paste0("producto_vino_", list_Type_Wine[[i]]))
  id <- stringr::str_replace_all(id, pattern = " y ", replacement = "_")
  id <- stringr::str_replace_all(id, pattern = "dulces_generosos", replacement = "dulce_generoso")
  web_Obj <- remDr$findElement("id", id)
  list_Values[[i]] <- web_Obj$getElementAttribute("value")
}

vec_Type_Wine <- unlist(list_Type_Wine)
vec_Values <- unlist(list_Values)

df <- data.frame(wine_Type = vec_Type_Wine, values = vec_Values)
df

        wine_Type values
1              Todos       
2              Tinto    211
3             Blanco    213
4             Rosado    215
5             Vermut    237
6           Espumoso    216
7 Dulces y Generosos    217
Emmanuel Hamel
  • 1,769
  • 7
  • 19