0

I'm trying to web scrape accounting results from publicly listed companies using R via RSelenium package, from Brazilian Exchange Comission (CVM) link below: https://www.rad.cvm.gov.br/ENET/frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=108692&CodigoTipoInstituicao=1

##Carregar o pacote da biblioteca
library(RSelenium)
library(glue) 
library(tidyverse) 
library(dplyr) 

#Cria o servidor
rD <- rsDriver(port = 4569L,
               ##Define a versão do Chrome que o Webdriver deve utilizar     
               chromever = '94.0.4606.61',
               ##Remove as informações do console
               verbose = F)

#Cria o driver para usar o R
remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4569L,
  browserName = "chrome"
)

#Abre o servidor

remDr$open()
remDr$navigate("https://www.rad.cvm.gov.br/ENET/
frmGerenciaPaginaFRE.aspx?NumeroSequencialDocumento=108692&CodigoTipoInstituicao=1")

#Seleciona o tipo de demonstrativo

zelecti <- remDr$findElement(using = "id", 
                             value = "cmbGrupo")
zelecti2 <- zelecti$findChildElements(using = "xpath", 
                                      value = "//option")

zelecti2[[3]]$clickElement()
zelecti2[[12]]$clickElement()

I run this code segment and it works fine. However, when I try to capture the first row of the table using Xpath, RSelenium cannot find the row element and returns an empty list from row object.

# Capturar Tabela
tabela  <-remDr$findElement(using = 'id', value = "iFrameFormulariosFilho")
# Capturar linha 1
row <-tabela$findChildElements(using = 'xpath', value = "//tbody//tr[2]")
# Capturar conteúdo da linha   
row_content <- sapply(row, function(x) x$getElementText())

It seems that the iframe value specification is referring to the correct object, as this line command returns no error an as I have checked it on Chrome console. I have also tried to capture other elements also related to the table, but RSelenium still cannot read them.

Is there any other way I can capture table content from this link?

  • you probably need to switch to the iframe content. I assume RSelenium, like other selenium flavours, provides this as a method? – QHarr Oct 19 '21 at 18:46
  • If that doesn't solve the issue drop me a comment here. – QHarr Oct 19 '21 at 18:48
  • 1
    I tried and now it seems to work: tabela <-remDr$findElements("css", "iframe") remDr$switchToFrame(tabela[[1]]) coluna <-remDr$findElement(using = 'xpath', value = '//*[@id="ctl00_cphPopUp_tbDados"]/tbody/tr[2]/') – Tiago.Barreira Oct 19 '21 at 19:51

0 Answers0