0

gsub is a function which allows us to extract and replace patterns in strings but I'm having a hard time trying to understand its underlying logic. For example, I want to extract the last part of these strings (extension):

files = c(
  "tmp-project.csv", "project.csv", 
  "project2-csv-specs.csv", "project2.csv2.specs.xlsx", 
  "project_cars.ods", "project-houses.csv", 
  "Project_Trees.csv","project-cars.R",
  "project-houses.r", "project-final.xls", 
  "Project-final2.xlsx"
)

gsub("\\.[a-zA-Z]*$", "\\1" ,files)

What I get is anything but the string I want.

 [1] "tmp-project"         "project"             "project2-csv-specs" 
 [4] "project2.csv2.specs" "project_cars"        "project-houses"     
 [7] "Project_Trees"       "project-cars"        "project-houses"     
[10] "project-final"       "Project-final2" 

What am I doing wrong and what's the logic of gsub? I know there is stringr package to handle this kind of problems in an easy way but I'm looking for an R base solution. Thank you.

Alejandro Carrera
  • 513
  • 1
  • 4
  • 14
  • 4
    You use `\1` as the replacement pattern, but have not defined any capturing group in the pattern. `gsub` replaces matches. To extract with `gsub`, you need to match the whole string and capture what you need to keep. So, `gsub(".*\\.([a-zA-Z]*)$", "\\1" ,files)` – Wiktor Stribiżew Feb 13 '20 at 21:32
  • This definetively answers all my questions. Thanks a lot, I had a very bad time struggling with gsub. – Alejandro Carrera Feb 13 '20 at 21:37

0 Answers0