0

I have written a script to to extract certain information from a site, the code runs fine but its awfully slow (i have compiled the function and enabled JIT), how can I check if the delay is due to web page traffic ? Any help would be appreciated

url = "http://www.currys.co.uk/gbuk/computing-accessories/accessories-and-bags/power-       cables/power-cables-adaptors/masterplug-bfg2-mp-4-gang-extension-cable-2m-00852134-pdt.html?    srcid=369&xtor=AL-1&cmpid=aff~!!!sitenamecm!!!~!!!promotypecm!!!~Computing+Accessorie"

getpagedata = function (url, uniq_id)

{
  srcpage = getURLContent(parenturl)
  page = htmlTreeParse(srcpage,useInternalNodes = T,encoding='UTF-8')    
  link = '0' 
  availability = xpathSApply(page, "//span[@class ='available']",xmlValue)

 if (length(availability) > 0 )
 {
   if (length(availability_option) > 0) 
   {   
         availability_option = paste0(toString(str_replace_all(availability_option,"\n|\t","")),",")
   }
availability = paste0("'",availability_option,toString(str_replace_all         (availability,"\n|\t","")),"'")
}

df2 = data.frame(starttime,Sys.time(),store,uniq_id,Avail_Flag,availability,link)
writetofile_c(df2)
free(page)
}

Thanks,

Roland
  • 127,288
  • 10
  • 191
  • 288
Savi
  • 167
  • 1
  • 9
  • You can time the different parts of your code, such as the fetch time and the time it takes to operate on the fetched content. – Michael Aaron Safyan Feb 16 '14 at 17:55
  • 3
    You need to provide details. Preferably reproducible code. – Roland Feb 16 '14 at 17:55
  • Add in your code, so we can figure out why is it awfully slow. It could be the program flow/methods used in the coding of yours. – rockinfresh Feb 16 '14 at 17:59
  • I have tried to fit the function in the given space .. – Savi Feb 16 '14 at 18:13
  • What do you mean by "busy"? Do you mean a [504 timeout](http://en.wikipedia.org/wiki/List_of_HTTP_status_codes) or just that it's slow? **RCurl** can be configured to respond to HTTP status codes but it doesn't sound like that's what you mean by busy. – Thomas Feb 17 '14 at 11:29
  • its slow .. its not 504 time out – Savi Feb 17 '14 at 14:24
  • I think you need to provide additional detail about what you want. Do you want the function to timeout and try again? Do you want it to wait infinitely? What's your measure of "busy"? – Thomas Feb 17 '14 at 15:24

0 Answers0