0

I am new with web scraping with R and will be very grateful for any help.

I am trying to scrape information from this website: https://kotis.kt.gov.lt/

My code looks like this:

#libraries
library(rvest)
library(tidyverse)



url <- "https://kotis.kt.gov.lt/"


#establish session
my_session <- session(url)


#get the form
form_unfilled <-  my_session %>% html_node('form') %>% html_form()
form_filled <- form_unfilled %>% html_form_set("aid_date[from]"='2022-08-25')
results <- session_submit(my_session, form_filled)


first_page <-  as.data.frame(results %>% html_elements(css = 'table') %>% html_table())

There are 21 columns in this table. However, the code is reading only 8 columns.

I will be grateful for any help.

  • I don't see any tables on that page. I'm not clear exactly what you are trying to end up with. – MrFlick Sep 12 '22 at 13:53
  • If you want to see manually, click on "Išsami Paieška" button (next to the search). It unveils a form. The code is able to access this form, and fill in some details to retrieve results. I get those results. However, instead of all columns, I only get 8 columns. – Swapnil Singh Sep 13 '22 at 06:19
  • I looked at it only briefly, but I think you might need selenium to properly select the columns. In the default mode, there is only 8 columns selected and you need to interactively check the boxes with the other columns as well. This is information that is not passed on in the initial GET request but triggered through a java script. Unfortunately it is a bit of a hassle https://www.r-bloggers.com/2014/12/scraping-with-selenium/ – Datapumpernickel Sep 16 '22 at 09:12

1 Answers1

0

Here is one approach that can be considered :

library(RSelenium)
library(rvest)
url <- "https://kotis.kt.gov.lt/"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)

web_Obj <- remDr$findElement("css selector", "body > main > div.bg-light.p-3 > div > form > div.input-group.input-group-search > div.input-group-append.search-btn-wrapper > button")
web_Obj$clickElement()

web_Obj_Date1 <- remDr$findElement("xpath", '/html/body/main/div[1]/div/form/div[2]/div/div[1]/div[1]/div/input[1]')
web_Obj_Date1$clickElement()

web_Obj_Date2 <- remDr$findElement("xpath", '/html/body/div[1]/div[2]/div/div[2]/div/span[12]')
web_Obj_Date2$clickElement()

web_Obj_Submit <- remDr$findElement("css selector", '#listing-filter > div > div.text-right.link-group > button')
web_Obj_Submit$clickElement()

html_Content <- remDr$getPageSource()[[1]]
read_html(html_Content) %>% html_table()

[[1]]
# A tibble: 25 x 8
   `Pagalbos gavejas`       `Pagalbos teikejas`                                          Pagalbos suteikim~1 Pagal~2 Teisi~3 Pagal~4 Busen~5 ``   
   <chr>                    <chr>                                                        <chr>               <chr>   <chr>   <chr>   <chr>   <lgl>
 1 "MB \"Šonulis\""         Socialines apsaugos ir darbo ministerija                     2022-09-09          753,48~ "VSF2"  Bendro~ Rezerv~ NA   
 2 "UAB \"Cargorest\""      Socialines apsaugos ir darbo ministerija                     2022-09-09          1.596,~ "VSF2"  Bendro~ Rezerv~ NA   
 3 "Tautvydas Kizinis"      Žemes ukio ministerija                                       2022-09-09          149,10~ "LR že~ Nereik~ Iregis~ NA   
 4 "Vytautas Juozas Petkus" Žemes ukio ministerija                                       2022-09-09          953,82~ "LR že~ Nereik~ Iregis~ NA   
 5 "Donatas Malinauskas"    Žemes ukio ministerija                                       2022-09-09          806,17~ "LR že~ Nereik~ Iregis~ NA   
 6 "Dangute Malinauskiene"  Žemes ukio ministerija                                       2022-09-09          640,36~ "LR že~ Nereik~ Iregis~ NA   
 7 "Darius Malinauskas"     Žemes ukio ministerija                                       2022-09-09          676,33~ "LR že~ Nereik~ Iregis~ NA   
 8 "Vaidas Paunksnis"       Valstybinio socialinio draudimo fondo valdybos Kauno skyrius 2022-09-09          83,63 ~ "1991 ~ Nereik~ Iregis~ NA   
 9 "Tadas Pocius"           Žemes ukio ministerija                                       2022-09-09          42,05 ~ "Jurba~ Valsty~ Iregis~ NA   
10 "Rytis Andriulaitis"     Žemes ukio ministerija                                       2022-09-09          762,84~ "Jurba~ Valsty~ Iregis~ NA   
# ... with 15 more rows, and abbreviated variable names 1: `Pagalbos suteikimo data`, 2: `Pagalbos suma`, 3: `Teisinis pagrindas`,
#   4: `Pagalbos rušis`, 5: `Busena`
# i Use `print(n = ...)` to see more rows
Emmanuel Hamel
  • 1,769
  • 7
  • 19