-2

This is the page I'm trying to scrape from, http://www.footballlocks.com/nfl_point_spreads_week_1.shtml , I want to end up with a simple data.frame with 4 columns so I can perform further analysis. I've have tried using the XML package but with not much luck. Thanks for your help

week.1 <- readHTMLTable("http://www.footballlocks.com/nfl_point_spreads_week_1.shtml")
str(week.1) 
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
RcodeNFL
  • 9
  • 1
  • 1
  • 1
  • What exactly did you try? What does "not much luck" mean exactly? What were you unable to accomplish? Right now it sounds like you're just asking someone to write the code for you rather than asking a specific programming question. – MrFlick Jan 30 '15 at 16:20
  • Are you looking for this: week.1 <- readHTMLTable("http://www.footballlocks.com/nfl_point_spreads_week_1.shtml", which=1) – cory Jan 30 '15 at 16:26
  • I can see why you'd be struggling with that site. Many, many, many levels of nested tables. Worst. @cory, did you try that? I doubt the content of that data frame is what the OP is looking for (footballocks is just a horribly crafted site) – hrbrmstr Jan 30 '15 at 16:30
  • cory, I am only looking for the first table on the page, ie the week 1 for 2014. And MrFlick I am new with data scraping in R so any help in the right direction would be greatly appreciated – RcodeNFL Jan 30 '15 at 16:35

2 Answers2

3

rvest can do this. You can use an XPath to find all the 4-column tables thusly:

library(rvest)

url <- "http://www.footballlocks.com/nfl_point_spreads_week_1.shtml"

pg <- html(url)

tabs <- pg %>% html_nodes(xpath="//table[@cols='4']")

html_table(tabs[[1]], header=TRUE)

##    Date & Time        Favorite Spread     Underdog
## 1  9/4 8:35 ET      At Seattle   -5.0    Green Bay
## 2  9/7 1:00 ET     New Orleans   -3.0   At Atlanta
## 3  9/7 1:00 ET    At St. Louis   -3.0    Minnesota
## 4  9/7 1:00 ET   At Pittsburgh   -6.0    Cleveland
## 5  9/7 1:00 ET At Philadelphia  -10.0 Jacksonville
## 6  9/7 1:00 ET      At NY Jets   -6.5      Oakland
## 7  9/7 1:00 ET    At Baltimore   -1.0   Cincinnati
## 8  9/7 1:00 ET      At Chicago   -7.0      Buffalo
## 9  9/7 1:00 ET      At Houston   -3.0   Washington
## 10 9/7 1:00 ET  At Kansas City   -3.0    Tennessee
## 11 9/7 1:00 ET     New England   -4.0     At Miami
## 12 9/7 4:25 ET    At Tampa Bay   -4.5     Carolina
## 13 9/7 4:25 ET   San Francisco   -3.5    At Dallas
## 14 9/7 8:30 ET       At Denver   -8.5 Indianapolis

If one needs to kick it up old school-like:

library(XML)

url <- "http://www.footballlocks.com/nfl_point_spreads_week_1.shtml"

doc <- htmlParse(url)

readHTMLTable(doc["//table[@cols='4']"][[1]])

(same output)

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
0

Pinnacle Sports has an API you can use if you want realtime NFL odds. Maybe better for your purposes than scraping a single week of odds from that webpage; it's a commonly-used source for football line analytics.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105