0

I'm using rvest and tidyverse to scrape and process some data off the web.

There was recently a change to the website where some of the data is now in 2 tables and you can change between them using a button.

I'm trying to figure out how to scrape the data from both. They seem to have the same css class now so I can't figure out how to access each individually.

The code below seems to grab the "extended snowfall history", but I can't seem to figure out how to get the "2022-2023 winter season" data. Obviously I'll need to do a little processing and math to put the "2022-2023 winter season" into a new row in "extended snowfall history", but I can't even figure out how to grab it.

Currently I have :

library(rvest)
library(tidyverse)

mammoth <- read_html('https://www.mammothmountain.com/on-the-mountain/historical-snowfall')

snow <- mammoth %>%
  html_element('table.css-86hwhl') %>% 
  html_table(header= TRUE, convert = TRUE) %>%
  mutate_if(is.character,as.factor) %>%
  mutate_if(is.integer,as.double) %>%
  select(-Total)
neilfws
  • 32,751
  • 5
  • 50
  • 63
agf1997
  • 2,668
  • 4
  • 21
  • 36

1 Answers1

2

A simple approach would be to use rvest::html_elements('table.css-86hwhl') (plural rather than singular) which will extract all html elements with the css class 'table.css-86hwhl'. Then you can manually choose the tables you want.

For example:

mammoth %>%
  html_elements('table.css-86hwhl') %>% 
  html_table(header= TRUE, convert = TRUE) 

gives a list of datasets

[[1]]
# A tibble: 53 × 13
   Season  `Pre-Oct`   Oct   Nov   Dec   Jan   Feb   Mar   Apr   May   Jun   Jul Total
   <chr>       <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
 1 1969-70        22     0   0    41    78    30.5  46    27     0       0     0  244.
 2 1970-71        60     0   0   109    29    19.5  24    14     0       0     0  256.
 3 1971-72        22     0   9   140.   32.2  11     1    53.5   0       0     0  268.
 4 1972-73         4     0  57.1  64.5  84.9 103    43    10     4       0     0  370.
 5 1973-74        45     0   0    45    87.5   9    82    38     0       0     0  306.
 6 1974-75        15     0  13    58.5  26   101    90    75     0       0     0  378.
 7 1975-76        27     0   0    14.5  13.5  54    50    38.5   0       0     0  198.
 8 1976-77         4     0   0     0    26    27    37     0     0       0     0   94 
 9 1977-78         6     0  26    98    95.5  97    85.5  78.5   1       0     0  488.
10 1978-79         6     0  29.5  51.5 102.   96    78    11.5  11.5     0     0  386.
# … with 43 more rows
# ℹ Use `print(n = ...)` to see more rows

[[2]]
# A tibble: 4 × 3
  Date       Inches `Season Total to Date`
  <chr>      <chr>  <chr>                 
1 November 8 "15\"" "28\""                
2 November 7 "2\""  "13\""                
3 November 3 "5\""  "11\""                
4 November 2 "6\""  "6\""                 

[[3]]
# A tibble: 53 × 13
   Season  `Pre-Oct`   Oct   Nov   Dec   Jan   Feb   Mar   Apr   May   Jun   Jul Total
   <chr>       <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
 1 1969-70        22     0   0    41    78    30.5  46    27     0       0     0  244.
 2 1970-71        60     0   0   109    29    19.5  24    14     0       0     0  256.
 3 1971-72        22     0   9   140.   32.2  11     1    53.5   0       0     0  268.
 4 1972-73         4     0  57.1  64.5  84.9 103    43    10     4       0     0  370.
 5 1973-74        45     0   0    45    87.5   9    82    38     0       0     0  306.
 6 1974-75        15     0  13    58.5  26   101    90    75     0       0     0  378.
 7 1975-76        27     0   0    14.5  13.5  54    50    38.5   0       0     0  198.
 8 1976-77         4     0   0     0    26    27    37     0     0       0     0   94 
 9 1977-78         6     0  26    98    95.5  97    85.5  78.5   1       0     0  488.
10 1978-79         6     0  29.5  51.5 102.   96    78    11.5  11.5     0     0  386.
# … with 43 more rows
# ℹ Use `print(n = ...)` to see more rows

[[4]]
# A tibble: 4 × 3
  Date       Inches `Season Total to Date`
  <chr>      <chr>  <chr>                 
1 November 8 "15\"" "28\""                
2 November 7 "2\""  "13\""                
3 November 3 "5\""  "11\""                
4 November 2 "6\""  "6\""                 

[[5]]
# A tibble: 53 × 13
   Season  `Pre-Oct`   Oct   Nov   Dec   Jan   Feb   Mar   Apr   May   Jun   Jul Total
   <chr>       <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
 1 1969-70        22     0   0    41    78    30.5  46    27     0       0     0  244.
 2 1970-71        60     0   0   109    29    19.5  24    14     0       0     0  256.
 3 1971-72        22     0   9   140.   32.2  11     1    53.5   0       0     0  268.
 4 1972-73         4     0  57.1  64.5  84.9 103    43    10     4       0     0  370.
 5 1973-74        45     0   0    45    87.5   9    82    38     0       0     0  306.
 6 1974-75        15     0  13    58.5  26   101    90    75     0       0     0  378.
 7 1975-76        27     0   0    14.5  13.5  54    50    38.5   0       0     0  198.
 8 1976-77         4     0   0     0    26    27    37     0     0       0     0   94 
 9 1977-78         6     0  26    98    95.5  97    85.5  78.5   1       0     0  488.
10 1978-79         6     0  29.5  51.5 102.   96    78    11.5  11.5     0     0  386.
# … with 43 more rows
# ℹ Use `print(n = ...)` to see more rows

You can then just extract [[1]] and [[2]] and go from there, the tables that you are looking for. I'm sure there's a more principled approach out there, but this should do the job.

Josh White
  • 1,003
  • 1
  • 17