I'm trying to get the range of the numbers at the end of this link: https://schedule.sxsw.com/2019/speakers/2008434
.
The link has a number at the end, e.g. the 2008434
. The links refer to the bios of speakers at the upcoming South by Southwest festival. I know there are 3729 speakers total, but that does not help me figure out how each speaker and their associated pages are numbered.
I'm trying to do some simple web-scraping using a lapply
function, but my function does not work when I can't specify a range. For example, I used:
number_range <- seq(1:3000000)
Clicking around the links gives no pattern to how they are numbered.
And I got a lot of Error in open.connection(x, "rb") : HTTP error 404.
Is there an easy way to get this range / make this function work? Code below:
library(rvest)
library(tidyverse)
# List for bios
sxsw_bios <- list()
# Creating vector of numbers
number_range <- seq(1:3000000)
# Scraping bios with names
sxsw_bios <- lapply(number_range, function(y) {
# Getting speaker name
Name <- read_html(paste0("https://schedule.sxsw.com/2019/speakers/",
paste0(y))) %>%
html_nodes(".speaker-name") %>%
html_text()