0

I would like to scrape the text "VIRGINIA TECH" from the site http://stats.statbroadcast.com/statmonitr/?id=102197 using the package RSelenium.

the css selector for the particular text I would like to scrape is:

.valigntop:nth-child(1) .width6-3-4.marginr

After opening the remote driver and navigating to the site I try:

webElem <- remDr$findElement(using = "css selector", '.valigntop:nth-child(1) .width6-3-4.marginr')
doc <- remDr$getPageSource()[[1]]
current_doc <- read_html(doc)
current_doc <- html_text(current_doc)

This returns a big block of text and not the text I want "VIRGINIA TECH".

After scrape what I would like:

current_doc
[1] "VIRGINIA TECH"

Any help will be appreciated. Please let me know if any further information is needed.

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
Dre
  • 713
  • 1
  • 8
  • 27
  • the following XPath seems to work pretty well `(//div[contains(@class, "teamname")])[1]` – hrbrmstr Feb 24 '16 at 13:54
  • Thank you for your comment. When I try `remDr$findElement(using = "xpath", "(//div[contains(@class, "teamname")])[1]")` it returns an `unexpected symbol` error. – Dre Feb 24 '16 at 14:27

2 Answers2

1

After reading thru this link I found that this works great to scrape my desired text.

webElems <- remDr$findElements(using = 'css selector', ".valigntop:nth-child(1) .width6-3-4.marginr")
current_doc <- unlist(lapply(webElems, function(x){x$getElementText()}))

Result:

current_doc
[1] "VIRGINIA TECH"
Dre
  • 713
  • 1
  • 8
  • 27
1

Simple one.

`webElems <- unlist(remDr$findElements(using = 'css selector', ".valigntop:nth-child(1) .width6-3-4.marginr")$getElementText())`

This works too!!

Bharath
  • 1,600
  • 14
  • 25