5

I want to extract a table periodicaly from below site.

price list changes when clicked building block names(BLOK 16 A, BLOK 16 B, BLOK 16 C, ...) . URL doesn't change, page changes by trigering

javascript:__doPostBack('ctl00$ContentPlaceHolder1$DataList2$ctl04$lnk_blok','')

I've tried 3 ways after searching google and starckoverflow.

what I've tried no 1: this doesn't triger doPostBack event.

postForm( "http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44", ctl00_ContentPlaceHolder1_DataList2_ctl03_lnk_blok="ctl00$ContentPlaceHolder1$DataList2$ctl03$lnk_blok")

what I've tried no 2: selenium remote seem to works on (http://localhost:4444/) but remotedriver doesn't navigate. returns this error. (Error in checkError(res) : Undefined error in httr call. httr output: length(url) == 1 is not TRUE)

library(RSelenium)
startServer()
remDr <- remoteDriver()
remDr <- remoteDriver(remoteServerAddr = "localhost" 
                  , port = 4444L, browserName = "firefox")
remDr$open()
remDr$getStatus()
remDr$navigate("http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44")

what I've tried no 3: this another way to triger dopostback event. it doesn't navigate.

base.url <- "http://www.kentkonut.com.tr/tr/modul/projeler/",
event.target <- 'ctl00$ContentPlaceHolder1$DataList2$ctl03$lnk_blok',
action <- "daire_fiyatlari.aspx?id=44"

ftarget <- paste0(base.url, action)
dum <- getURL(ftarget)
event.val <- unlist(strsplit(dum,"__EVENTVALIDATION\" value=\""))[2]
event.val <- unlist(strsplit(event.val,"\" />\r\n\r\n<script"))[1]
view.state <- unlist(strsplit(dum,"id=\"__VIEWSTATE\" value=\""))[2]
view.state <- unlist(strsplit(view.state,"\" />\r\n\r\n\r\n<script"))[1]
web.data <- postForm(ftarget, "form name" = "ctl00_ContentPlaceHolder1_DataList2_ctl03_lnk_blok", 
                   "method" = "POST", 
                   "action" = action, 
                   "id" = "ctl00_ContentPlaceHolder1_DataList2_ctl03_lnk_blok",
                   "__EVENTTARGET"=event.target,
                   "__EVENTVALIDATION"=event.val,
                   "__VIEWSTATE"=view.state)

thanks for your help.

Selcuk Akbas
  • 711
  • 1
  • 8
  • 20

1 Answers1

4
library(rvest)    
url<-"http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44"
    pgsession<-html_session(url) 
    t<-html_table(html_nodes(read_html(pgsession), css = "#ctl00_ContentPlaceHolder1_DataList1"), fill= TRUE)[[1]]
    even_indices<-seq(2,length(t$X1),2)
    t<-t[even_indices,]
    t<-t[2:(length(t$X1)),]

EDITED CODE:

library(rvest)    
url<-"http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44"
pgsession<-html_session(url)
pgform<-html_form(pgsession)[[1]]
page<-rvest:::request_POST(pgsession,"http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44",
                           body=list(
                             `__VIEWSTATE`=pgform$fields$`__VIEWSTATE`$value,
                             `__EVENTTARGET`="ctl00$ContentPlaceHolder1$DataList2$ctl01$lnk_blok",
                             `__EVENTARGUMENT`="",
                             `__VIEWSTATEGENERATOR`=pgform$fields$`__VIEWSTATEGENERATOR`$value,
                             `__VIEWSTATEENCRYPTED`=pgform$fields$`__VIEWSTATEENCRYPTED`$value,
                             `__EVENTVALIDATION`=pgform$fields$`__EVENTVALIDATION`$value
                           ),
                           encode="form"
                           )
# in the above example change eventtarget as "ctl00$ContentPlaceHolder1$DataList2$ctl02$lnk_blok" to get different table

t<-html_table(html_nodes(read_html(page), css = "#ctl00_ContentPlaceHolder1_DataList1"), fill= TRUE)[[1]]
even_indices<-seq(2,length(t$X1),2)
t<-t[even_indices,]
t<-t[2:(length(t$X1)),]
Bharath
  • 1,600
  • 14
  • 25
  • thanks @Bharath your way geting table is very clear, thanks. what I need to crawl from that site is not groing along page ids but sub pages which appears when clicked to BLOK 16 B or BLOK 16 C by trigering js ctl00$ContentPlaceHolder1$DataList2$ctl00$lnk_blok ctl00$ContentPlaceHolder1$DataList2$ctl01$lnk_blok ctl00$ContentPlaceHolder1$DataList2$ctl02$lnk_blok ctl00$ContentPlaceHolder1$DataList2$ctl03$lnk_blok – Selcuk Akbas Jan 31 '17 at 18:33
  • So do you want to go to these pages? http://www.kentkonut.com.tr/tr/modul/projeler/daire_detay.aspx?id=44&blok=318&daire=01 http://www.kentkonut.com.tr/tr/modul/projeler/daire_detay.aspx?id=44&blok=318&daire=02 – Bharath Jan 31 '17 at 20:45
  • sorry, no, when you click the yellow links on this pic https://postimg.org/image/edbocvd3z/ URL doesn't change but below table changes – Selcuk Akbas Feb 01 '17 at 08:31
  • thanks, that worked. Is this a subject of expertise at js or what else ? – Selcuk Akbas Feb 19 '17 at 21:17
  • @Bharath Tried your technique but could not browse through the "Next" button of a random page that I selected to master the art. Here is the link and I can open a new thread if that makes more sense http://content.smctradeonline.com/Commodity/SpotPrices.aspx?id=24 – Sushanta Deb Sep 21 '17 at 16:21
  • This would have to be in a different thread if you can. We will investigate that. – Bharath Sep 22 '17 at 17:22