Extract part of URL in R

Question

dput(mydf)
structure(list(urls = c("/players/a/abdulma02.html", 
"/players/a/abdulta01.html", 
"/players/a/abdursh01.html", "/players/a/alexaco01.html", "/players/a/alexaco02.html"
), names = c("Mahmoud Abdul-Rauf", "Tariq Abdul-Wahad", "Shareef Abdur-Rahim", 
"Cory Alexander", "Courtney Alexander")), row.names = c(NA, 5L
), class = "data.frame")

head(mydf)
                       urls               names
1 /players/a/abdulma02.html  Mahmoud Abdul-Rauf
2 /players/a/abdulta01.html   Tariq Abdul-Wahad
3 /players/a/abdursh01.html Shareef Abdur-Rahim
4 /players/a/alexaco01.html      Cory Alexander
5 /players/a/alexaco02.html  Courtney Alexander

My problem is simple - I would like to extract the part of the url before the html (abdulma02, abdulta01, etc.). The data is formatted such that the ending will always be .html, and the start will always be /players/{single letter}/{what i want}.html

I have taken a stab at this using the new urltools library to no success (tried their urltools::suffix_extract() function). Any help here is appreciated.

score 1 · Accepted Answer · answered Dec 07 '18 at 18:22

1

We can use

tools::file_path_sans_ext(basename(mydf$urls))
#[1] "abdulma02" "abdulta01" "abdursh01" "alexaco01" "alexaco02"

answered Dec 07 '18 at 18:22

akrun

874,273
37
540
662

1

gotta bookmark this post - greatly appreciated @akrun – Canovice Dec 07 '18 at 18:24

Extract part of URL in R

1 Answers1