I'm trying to scrape all competitor information such as the competitors division, gender, belt, weight, and other things from this website. The end goal is to put all competitor information from this page into one data frame.
First Question: The division, gender, belt, and weight only appear once at the top of the page, but I want r to automatically fill in this information next to each competitors name in a data frame. How can I code this so that the appropriate information is correctly filled next to each competitor?
Second Question: How can I input NA for missing information, like the date or competitor number?
Because of the varying lengths, my code cannot place any of the scraped data into a df.
library(rvest)
library(tidyverse)
MensUrl <- read_html('https://www.bjjcompsystem.com/tournaments/1869/categories/2053147')
## SCRAPE FIGHT INFO -------------------------------------------
ageDivision <- MensUrl %>%
html_nodes('.category-title__age-division') %>%
html_text()
gender <- MensUrl %>%
html_nodes('.category-title__age-division+ .category-title__label') %>%
html_text()
belt <- MensUrl %>%
html_nodes('.category-title__label:nth-child(3)') %>%
html_text()
weight <- MensUrl %>%
html_nodes('.category-title__label:nth-child(4)') %>%
html_text()
fightAndMat <- MensUrl %>%
html_nodes('.bracket-match-header__where , .bracket-match-header__fight') %>%
html_text()
date = MensUrl %>%
html_nodes('.bracket-match-header__when') %>%
html_text()
CompetitorNo = MensUrl %>%
html_nodes('.match-card__competitor-n') %>%
html_text()
name = MensUrl %>%
html_nodes('.match-card__competitor-description div:nth-child(1)') %>%
html_text()
gym = MensUrl %>%
html_nodes('.match-card__club-name') %>%
html_text()
# create match df
matches = data.frame('division' = ageDivision,
'gender' = gender,
'belt' = belt,
'weight' = weight,
'fightAndMat' = fightAndMat,
'date' = date,
'competitor' = CompetitorNo,
'name' = name,
'gym' = gym)
This is similar to what the end data frame should look like: