I would like to split strings in my dataframe using stringr
.
The following is my dataframe:
df<-data.frame(ID = 1:26,
DRUG_STRENGTH = c("50 MG", "1250 MG", "20 MG", "200 MG", "2MG", "60MG", NA, "300IU",
NA, "600 MG", "500MG", "625MG", NA, NA, "50MG/ML", "40MG", "200MG",
"200MG", "200MG", "5 MG", "5 MG", "200MG", "300IU/3ML", "0.05%",
"112.5 BILLION", "10.8MG"))
My desired dataframe is:
# > df
# ID DRUG_STRENGTH DRUG_STRENGTH_NO DRUG_STRENGTH_UNIT
# 1 1 50 MG 50 MG
# 2 2 1250 MG 1250 MG
# 3 3 20 MG 20 MG
# 4 4 200 MG 200 MG
# 5 5 2MG 2 MG
# 6 6 60MG 60 MG
# 7 7 <NA> <NA> <NA>
# 8 8 300IU 300 IU
# 9 9 <NA> <NA> <NA>
# 10 10 600 MG 600 MG
# 11 11 500MG 500 MG
# 12 12 625MG 625 MG
# 13 13 <NA> <NA> <NA>
# 14 14 <NA> <NA> <NA>
# 15 15 50MG/ML 50 MG/ML
# 16 16 40MG 40 MG
# 17 17 200MG 200 MG
# 18 18 200MG 200 MG
# 19 19 200MG 200 MG
# 20 20 5 MG 5 MG
# 21 21 5 MG 5 MG
# 22 22 200MG 200 MG
# 23 23 300IU/3ML 300 IU/3ML
# 24 24 0.05% 0.05 %
# 25 25 112.5 BILLION 112.5 BILLION
# 26 26 10.8MG 10.8 MG
My code gives me my desired df but I would like to ask if there is a nicer way to write the regular expressions.
df <- df %>%
mutate(DRUG_STRENGTH_NO = str_extract(DRUG_STRENGTH, pattern = "^\\d\\.?\\d?\\.?\\d?\\.?\\d*"),
DRUG_STRENGTH_UNIT = str_trim(str_replace(DRUG_STRENGTH, pattern = "^\\d\\.?\\d?\\.?\\d?\\.?\\d*", replacement = "")))