0

It's been a while since I visited stackoverflow, I have a problem with parsing a html file. I am trying to parse the following link

edata <- read_html("https://mmiconnect.in/app/ep-2022/registration/show-catalogue")

But I am not able to parse the html file using html_nodes, I tried all possible class names, but for no result.

I am trying to get all the company names, that participated in the EXPO, I tried various "class",

html_nodes('.fuse-widget-front .mat-elevation-z4 .m-2 .bg-white')

But for any results.

The company names that I am trying to download

Alphaneo
  • 12,079
  • 22
  • 71
  • 89

1 Answers1

1

I have been able to parse the html with the following code :

library(RSelenium)
library(rvest)
url <- "https://mmiconnect.in/app/ep-2022/registration/show-catalogue"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
htmltxt <- remDr$getPageSource()[[1]]
read_html(htmltxt) %>% html_node(xpath = '//*/img') %>% html_attr('src')

[1] "https://mmiconnectstorage.azureedge.net/global-manual-upload/ep-2022-visitor-reg-banner.jpg"
Emmanuel Hamel
  • 1,769
  • 7
  • 19