1
dput(mydf)
structure(list(urls = c("/players/a/abdulma02.html", 
"/players/a/abdulta01.html", 
"/players/a/abdursh01.html", "/players/a/alexaco01.html", "/players/a/alexaco02.html"
), names = c("Mahmoud Abdul-Rauf", "Tariq Abdul-Wahad", "Shareef Abdur-Rahim", 
"Cory Alexander", "Courtney Alexander")), row.names = c(NA, 5L
), class = "data.frame")

head(mydf)
                       urls               names
1 /players/a/abdulma02.html  Mahmoud Abdul-Rauf
2 /players/a/abdulta01.html   Tariq Abdul-Wahad
3 /players/a/abdursh01.html Shareef Abdur-Rahim
4 /players/a/alexaco01.html      Cory Alexander
5 /players/a/alexaco02.html  Courtney Alexander

My problem is simple - I would like to extract the part of the url before the html (abdulma02, abdulta01, etc.). The data is formatted such that the ending will always be .html, and the start will always be /players/{single letter}/{what i want}.html

I have taken a stab at this using the new urltools library to no success (tried their urltools::suffix_extract() function). Any help here is appreciated.

Canovice
  • 9,012
  • 22
  • 93
  • 211

1 Answers1

1

We can use

tools::file_path_sans_ext(basename(mydf$urls))
#[1] "abdulma02" "abdulta01" "abdursh01" "alexaco01" "alexaco02"
akrun
  • 874,273
  • 37
  • 540
  • 662