0

I'm trying to scrape this URL [of tennis league scores][1]

[1]: http://tennislink.usta.com/leagues/Main/statsandstandings.aspx#&&s=2%7C%7C%7C%7C4.0%7C%7CM%7C%7C2016%7C%7C9%7C%7C310%7C%7C. My goal is to automate scraping the results of my teams for analysis.

Using rvest and phantomJS I can easily scrape the table on the above link and create an R dataframe with the five cols.

However, I also want to capture the href= for each row so that I can follow the link and scrape the details for each row. When I "inspect" the first element of a row (the element with the embedded link) I don't see the URL but rather see this

<a id="ctl00_mainContent_rptYearForTeamResults_ctl00_rptYearTeamsInfo_ctl16_LinkButton1" href="javascript:__doPostBack('ctl00$mainContent$rptYearForTeamResults$ctl00$rptYearTeamsInfo$ctl16$LinkButton1','')" class="">Text appears here that I can easily scrape</a>

I've searched for how to scrape dopostback's in R but have not found anything useful. I did find references to Rselenium and have looked at the Cran Rselenium website but could not find references to deal with dopostback.

I also found references to phantomjs, which allowed me to scrape the table.

I have successfully scraped html at other times programmatically using R and rvest, including capturing URL's embedded directly in the HTML with href=, following those URL's programmatically, and continuing the scraping for thousands of records.

However, dopostback has stumped me - I have no javascript skills.

I've tried to find clues using "inspect element" that would allow me to simulate the dopostback in R but nothing jumps out at me.

I would appreciate any help.

LWRMS
  • 547
  • 8
  • 18
  • The only way to get those URLs is to actually virtually click on them in selenium and grab the new location URI. This is a wretched SharePoint/.NET-backed site. – hrbrmstr Aug 23 '16 at 19:22
  • Thanks for the post. I'll have to dig into Rselenium and learn how to do what you suggest. – LWRMS Aug 23 '16 at 22:37

0 Answers0