I have the following data frame:
df <- data.frame(
Name= c('AMLOD VALSAR HCT MPH Filmtabl 10+160+25mg 100Stk','ARTHROTEC 50 Bitabs 50+0.2mg 50Stk','GLUCOPHAGE Filmtabl 850mg 100Stk'),
Aug20Cu= c(1000,1831,7430),
Sep20Cu= c(899,822, 1000)
)
Name Aug20Cu Sep20C
1 AMLOD VALSAR HCT MPH Filmtabl 10+160+25mg 100Stk 1000 899
2 ARTHROTEC 50 Bitabs 50+0.2mg 50Stk 1831 822
3 GLUCOPHAGE Filmtabl 850mg 100Stk 7430 1000
I would like to extract the different numbers of the first column "Name" into separate columns, achieving therefore the following result:
Name a b c Aug20Cu Sep20C
1 AMLOD VALSAR HCT MPH Filmtabl 10+160+25mg 100Stk 10 160 25 1000 899
2 ARTHROTEC 50 Bitabs 50+0.2mg 50Stk 50 0.2 NA 1831 822
3 GLUCOPHAGE Filmtabl 850mg 100Stk 850 NA NA 7430 1000
I have tried the following code:
df<-df %>% tidyr::extract(Name,c("a", "b", "c"),'(\\d+(?=\\+))(\\d+(?=\\+))(\\d+(?=mg))',convert=TRUE, remove=FALSE)
or
df<-df %>% tidyr::extract(Name,c("a", "b", "c"),'(\d+(?=\+|mg))',convert=TRUE, remove=FALSE)
I don't really understand Regex and I have therefore no idea of what I'm doing wrong. I have tried to create the last Regex code in "regex101.com" and there it seems to work but as soon I try it in R I have a weird result (the first letter of the strings Name).