0

Trying to help out a friend with data munging a Miami Dolphins football schedule into a tibble

library(htmltab)
library(tidyr)
library(tibble)

url <- "http://www.espn.com/nfl/team/schedule/_/name/mia"
data <- htmltab(doc = url, which = 1, header = 2)

unique(data)

as_tibble(data)

The table it extracts the same headers (variable). I'm missing something. Need a little help in converting the htmltab to a tibble. Thanks.

What the table should look like

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Isaiah
  • 1
  • 1

1 Answers1

0

So I am using the "rvest" package to get data from websites. I think the main problem is that this website doesn't provide a nice clear table format that you can directly use it. You have to clean it up to get the desired output.

rm(list=ls())
library(tidyverse)
library(rvest)

##### get data from web #####
url = "http://www.espn.com/nfl/team/schedule/_/name/mia"
tb <- url %>%
  read_html() %>%
  html_table() # this function is actually going to read all tables at this url
rawdata = tb[[1]] # tb is a list and here we only want the fist table

#### clean up the data #####
names(rawdata) = rawdata[2,] # using the second row as data names
tmp = data[grepl("from",data$TICKETS),] # select rows that contain "from"
tmp2 = tmp[,!duplicated(names(tmp))] # delete columns that have duplicated column names
res = as_tibble(tmp2) # convert to tibble

For the cleaning section, I did it step by step by observing the data. Of course, there are plenty of ways of performing the same task.

Tony416
  • 596
  • 6
  • 11