2

I am trying to render a table on shinyapps.io, but it is populating with all NA's. I am scraping NCAA basketball spreads from https://www.vegasinsider.com/college-basketball/odds/las-vegas/. Locally, the table renders fine. But on shinyapps.io, all the numeric spreads display as NA's. It only displays correctly on shinyapps.io if all the spread values are characters. But then I cannot perform any math operations. As soon as the BetMGM, Caesers, FanDuel columns are numeric, they display with NA. I'll provide some code and data to help recreate the issue. There was a lot of data cleaning steps that I will skip for the sake of brevity.

@akrun here is the code to scrape the table. I do this and then some regex to split apart the game_info into components.

# Table Scraping Code

url <- read_html("https://www.vegasinsider.com/college-basketball/odds/las-vegas/")

spread_table <- url %>% html_table(fill = TRUE)

spread_table <- spread_table[[8]]


spread_table <- spread_table %>%
  rename(game_info = X1,
         VegasInsiderOpen = X2,
         BetMGM = X3,
         Caesers = X4,
         Circa = X5,
         FanDuel = X6,
         DraftKings = X7,
         PointsBet = X8,
         SuperBook = X9,
         VegasInsiderConsensus = X10)



# A tibble: 8 × 15 (spread_table)

 date   time      away_team_name  home_team_name  BetMGM  Caesers  FanDuel 
<chr>   <chr>         <chr>          <chr>        <dbl>   <dbl>    <dbl>  
 12/23  7:00 PM   George Mason    Wisconsin       -11.5   -11.5    -11.5
 12/23  4:00 PM   Liberty         Stanford        -1.5    -2.0     -2.0
 12/23  10:00 PM  BYU             Vanderbilt       4.0     5.5      5.5
 12/24  12:00 AM  South Florida   Hawaii          -4      -3.5      NA

An extremely simplified version of the Shiny app:

ui <- fluidPage(
  titlePanel("NCAAB Spreads App"),
  tableOutput("upcoming_games")
)

server <- function(input, output, session) {

  output$upcoming_games <- renderTable({

    spread_table

  })

}

shinyApp(ui, server)

@akrun

enter image description here

Syracuse, Xavier, Ball State, Notre Dame, Boise State, St Marys are the favored teams in this subset. But there is no telling that from the dataframe I am getting from your code.

enter image description here

Here is the dataframe below @jpdugo17 so it is not lost

structure(list(date = c("12/27", "12/28", "12/28", "12/28", 
"12/28", 
"12/28"), time = c("6:00 PM", "7:00 PM", "8:00 PM", "8:00 PM", 
"9:00 PM", "10:00 PM"), away_team_name = c("Brown", 
"Connecticut", 
"Ball State", "Notre Dame", "Fresno State", "Yale"),         
home_team_name = c("Syracuse", 
"Xavier", "Northern Illinois", "Pittsburgh", "Boise State", 
"St. Marys (CA)"
), VegasInsiderOpen = c(-10.5, -3, -3, -6, -4, -12.5), BetMGM = 
c(-9.5, 
NA, NA, NA, NA, NA), Caesers = c(-10, NA, NA, -3.5, -4, -13), 
    Circa = c(-9.5, NA, NA, NA, NA, NA), FanDuel = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
DraftKings = c(-9.5, 
    -3, -2, -3.5, -3.5, -12.5), PointsBet = c(NA_real_, 
NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), SuperBook = 
c(-9.5, 
    NA, NA, -4, -4, -13), VegasInsiderConsensus = c(-9.5, -3, 
    -2, -4, -4, -13)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))
bodega18
  • 596
  • 2
  • 13
  • Can you change it to `renderDataTable` and `dataTableOutput` from `DT` and run it again – akrun Dec 23 '21 at 18:55
  • Is your updated code correct? I get all the columns as `character` class for `spread_table` - `sapply(spread_table, class)# game_info VegasInsiderOpen BetMGM Caesers Circa FanDuel DraftKings "character" "character" "character" "character" "character" "character" "character" PointsBet SuperBook VegasInsiderConsensus "character" "character" "character"` – akrun Dec 23 '21 at 18:59
  • @akrun I've tried renderTable, renderDataTable, and render_gt and none have worked. And yes - I called as.numeric() on each sportsbook it is scraping the spreads for. – bodega18 Dec 23 '21 at 19:02
  • I get column values from your code as `VegasInsiderConsensus: chr [1:8] "135½u-10-2 -10" "135½u-10-5½ -10" "127½u-10-2 -10" "135½u-10-2½ -10" ...` which is clearly not numeric – akrun Dec 23 '21 at 19:02
  • It is not clear when you called `as.numeric`. If you are calling `as.numeric` on those columns as in the previous comment, it will only return `NA` – akrun Dec 23 '21 at 19:05
  • @akrun I used regex to remove all "o", "u", "EV", etc. Then I split apart that column with a "-" delimiter to get only the spread values. And then from there made each spread value numeric. Kind of a janky way of doing it but should still work imo. If that makes sense. – bodega18 Dec 23 '21 at 19:06
  • So your question code is not complete. What I believe is that some part of your split/transformation is not working within the shiny. Without a reproducible code it is not clear where it is causing the error though. From my understanding, even if you have a single non digit character and then you convert to `numeric`, it will return `NA` – akrun Dec 23 '21 at 19:07
  • 1
    From the extracted `spread_table`, can you do `spread_table1 <- spread_table %>% dplyr::select(game_info, BetMGM, Caesers, FanDuel) %>% tidyr::extract(game_info, into = c("date", "time", "away_team_name", "home_team_name"), "^(\\S+)\\s+([^\n]+)[^A-Za-z]+([^\n]+)[^A-Za-z]+(.*)") %>% dplyr::mutate(across(BetMGM:FanDuel, ~ purrr::map_dbl(stringr::str_replace(str_extract(., "-?[^-u]+(?=\\s)"), "(\\d+)½", "(\\1 + 0.5)"), ~ eval(parse(text = .x)))))` – akrun Dec 23 '21 at 19:53
  • @bodega18 What is your desired output from for example, 135½u-10-2 -10? Can you update your question to reflect that? – jpdugo17 Dec 23 '21 at 23:06
  • @akrun You're a wizard my man. Thanks for that, exactly what I needed. I just scraped the website and there's only one upcoming game so it's hard to tell if its completely working but looks great so far. Appreciate it – bodega18 Dec 24 '21 at 06:36
  • @bodega18 i posted my comment as a solution. If it works, please consider to accept it. thanks – akrun Dec 24 '21 at 16:35
  • @bodega18 You can use `dput(spread_table)` and copy the result from the console to avoid loosing the data. Also, not doing unnecessary requests to the server. – jpdugo17 Dec 27 '21 at 23:44

1 Answers1

2

It seems that the spread_table after scraping may be post-processed in a way that couldn't convert the extracted substring into numeric class - i.e. when we do as.numeric, if there is any character, it may convert to NA.

In the below code, select the columns of interest after scraping, then extract the substring from the 'game_info' column to split into 'date', 'time', 'away_team_name' and 'home_team_name' based on a regex pattern matching and capturing ((...)) those groups that meet the criteria. (^(\\S+)) - captures the first group as one or more non white spaces characters from the start (^) of the string, followed by one or more white space (\\s+), then capture characters that are not newline character (([^\n]+)) followed by any character that is not letter ([^A-Za-z]+), capture third groups as one or more characters not the newline followed by again the characters not a letter and capture the rest of the characters ((.*)). Then loop across the 'BetMGM' to 'FanDuel', extract the substring characters not having u or - and is followed by a space ((?=\\s)), replace the substring fraction with + 0.5 (as there was only a single fraction), loop over the string and evalutate the string

library(dplyr)
library(tidyr)
library(purrr)
spread_table1 <- spread_table %>%
   dplyr::select(game_info, BetMGM, Caesers, FanDuel) %>% 
   tidyr::extract(game_info, into = c("date", "time", "away_team_name", 
    "home_team_name"), "^(\\S+)\\s+([^\n]+)[^A-Za-z]+([^\n]+)[^A-Za-z]+(.*)")  %>% 
   dplyr::mutate(across(BetMGM:FanDuel, ~
    purrr::map_dbl(stringr::str_replace(str_extract(., "-?[^-u]+(?=\\s)"), 
           "(\\d+)½", "(\\1 + 0.5)"), ~ eval(parse(text = .x)))))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • can you please explain what does `[^A-Za-z]+([^\n]+)` regex mean? – jpdugo17 Dec 24 '21 at 16:49
  • 1
    @jpdugo17 i added some description. hope it helps – akrun Dec 24 '21 at 16:58
  • @akrun your regex to strip the spreads for each game: is it pulling the spread # from the home or away team's perspective? If you scrape now, there is only 1 upcoming game. But when I looked the other day, it looked like some of them were mismatched. – bodega18 Dec 27 '21 at 17:37
  • @bodega18 If you look at the extract step, it is extracting from the game_info, column. the third and fourth capture groups where it is not a newline and the last one is the rest of the characters (`(.*)`) It is possible that your original column have some cases that doesn't match the regex – akrun Dec 27 '21 at 17:40
  • @akrun Your code is only pulling the spread for the favored team, indicated by the "-" sign. I have no way of telling which team is favored though (home or away). Is there a way to get the spread always from the home team's perspective? So if the home team was an underdog it would show as a positive number. What I have highlighted in yellow in the screenshot is what I need. It appears the home team is always the one appearing second in the game info column. Thanks – bodega18 Dec 27 '21 at 23:36
  • 1
    @bodega18 can you post as a new question with all the patterns as I am not clear from your comments – akrun Dec 28 '21 at 17:50