Regex to extract two specifc words from string

Question

I am parsing some files and I had planned to extract the information from somewhere within the file, but this failed due to special characters. The words I need are still contained in the filename but there is also other stuff in there.

I am assuming you could extract those with proper regular expression, but I am not able to do so. The origin is the word between the last and second last underscore. The destination is word between the .rds and the last underscore

name1<-"2020-06-15 11_41_40_Magdeburg_Bitterfeld-Wolfen.rds"
name2<-"2020-06-15 11_41_53_Niebüll_Sylt OT Westerland.rds"
name3<-"2020-06-15 11_41_57_Augsburg_Düsseldorf.rds"

I am parsing each file separtely and provided three examples. I would expect

name1_orgin<-"Magdeburg"
name1_dest<- "Bitterfeld-Wolfen"
name2_orgin<-"Niebüll"
name2_dest<- "Sylt OT Westerland"
name3_orgin<-"Augsburg"
name3_dest<- "Düsseldorf

score 0 · Accepted Answer · answered Jun 15 '20 at 10:00

0

You can use str_match :

stringr::str_match(c(name1, name2, name3), '.*_(.*)_(.*)\\.rds')[, -1]

#     [,1]        [,2]                
#[1,] "Magdeburg" "Bitterfeld-Wolfen" 
#[2,] "Niebüll"   "Sylt OT Westerland"
#[3,] "Augsburg"  "Düsseldorf"

answered Jun 15 '20 at 10:00

Ronak Shah

377,200
20
156
213

thx I wish i understood how this works, but it works :) – Max M Jun 15 '20 at 10:05
`(.*)` is used as capture group to capture values between underscores. By default regex are greedy hence it tries to capture as many characters as possible before a match. – Ronak Shah Jun 15 '20 at 10:18

Regex to extract two specifc words from string

1 Answers1