0

I am parsing some files and I had planned to extract the information from somewhere within the file, but this failed due to special characters. The words I need are still contained in the filename but there is also other stuff in there.

I am assuming you could extract those with proper regular expression, but I am not able to do so. The origin is the word between the last and second last underscore. The destination is word between the .rds and the last underscore

name1<-"2020-06-15 11_41_40_Magdeburg_Bitterfeld-Wolfen.rds"
name2<-"2020-06-15 11_41_53_Niebüll_Sylt OT Westerland.rds"
name3<-"2020-06-15 11_41_57_Augsburg_Düsseldorf.rds"

I am parsing each file separtely and provided three examples. I would expect

name1_orgin<-"Magdeburg"
name1_dest<- "Bitterfeld-Wolfen"
name2_orgin<-"Niebüll"
name2_dest<- "Sylt OT Westerland"
name3_orgin<-"Augsburg"
name3_dest<- "Düsseldorf
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Max M
  • 806
  • 14
  • 29

1 Answers1

0

You can use str_match :

stringr::str_match(c(name1, name2, name3), '.*_(.*)_(.*)\\.rds')[, -1]

#     [,1]        [,2]                
#[1,] "Magdeburg" "Bitterfeld-Wolfen" 
#[2,] "Niebüll"   "Sylt OT Westerland"
#[3,] "Augsburg"  "Düsseldorf"        
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thx I wish i understood how this works, but it works :) – Max M Jun 15 '20 at 10:05
  • `(.*)` is used as capture group to capture values between underscores. By default regex are greedy hence it tries to capture as many characters as possible before a match. – Ronak Shah Jun 15 '20 at 10:18