I have a dataframe that looks like this. I need to create a new column of values using the strings in PROJ_ID column and form the strings of values from the PROJ_NAME column.
The solutions provided here (Accumulate values from one column based on keys of another column in r) take a long time to run (and do not produce correct output with case shown below), so I used @r2evans recommendation.
PROJ_ID PROJ_NAME
1 KA0034 A JST#3
2 KA0034.10 A JST#3-Dares
3 KA0034.10.110201 A JST#3-Dares
4 KA0034.10.110201.LOV VOM
5 KA0034.10.110201.LOV.MAX A JST#3-Dares
6 KA0034.FN Some Invent
7 KA0034.FN.010XYZ Some Invent
8 KA0034.FN.010XYZ.LEX Some Invent
9 KA0034.FN.010XYZ.LEX.NAT A JST#3
input <- data.frame(
PROJ_ID = c("KA0034",
"KA0034.10",
"KA0034.10.110201",
"KA0034.10.110201.LOV",
"KA0034.10.110201.LOV.MAX",
"KA0034.FN",
"KA0034.FN.010XYZ",
"KA0034.FN.010XYZ.LEX",
"KA0034.FN.010XYZ.LEX.NAT"),
PROJ_NAME = c("A JST#3",
"A JST#3-Dares",
"A JST#3-Dares",
"VOM",
"A JST#3-Dares",
"Some Invent",
"Some Invent",
"Some Invent",
"A JST#3")
)
fun <- function(st) strcapture("(.*)[.][^.]+$", st, list(L=""))$L
input <- input %>%
mutate(K = fun(PROJ_ID))
while (TRUE) {
input <- left_join(input, select(input, PROJ_ID, iss = PROJ_NAME), by = c("K" = "PROJ_ID")) %>%
mutate(
PROJ_NAME = if_else(is.na(iss), PROJ_NAME,
if_else(PROJ_ID == K, PROJ_NAME, paste(iss, PROJ_NAME, sep = "."))),
K = fun(K)) %>%
select(-iss)
if (all(is.na(input$K))) break
}
input$K <- NULL
# Update the PROJ_NAME column to include repeated parts for each unique PROJ_NAME
input$PROJ_NAME <- sapply(strsplit(as.character(input$PROJ_NAME), "\\."), function(x) {
unique_parts <- unique(x)
paste(unique_parts, collapse = ".")
})
# Print the updated dataframe
print(input)
Output:
PROJ_ID PROJ_NAME
1 KA0034 A JST#3
2 KA0034.10 A JST#3.A JST#3-Dares
3 KA0034.10.110201 A JST#3.A JST#3-Dares
4 KA0034.10.110201.LOV A JST#3.A JST#3-Dares.VOM
5 KA0034.10.110201.LOV.MAX A JST#3.A JST#3-Dares.VOM
6 KA0034.FN A JST#3.Some Invent
7 KA0034.FN.010XYZ A JST#3.Some Invent
8 KA0034.FN.010XYZ.LEX A JST#3.Some Invent
9 KA0034.FN.010XYZ.LEX.NAT A JST#3.Some Invent
Output needed:
PROJ_ID PROJ_NAME
1 KA0034 A JST#3
2 KA0034.10 A JST#3.A JST#3-Dares
3 KA0034.10.110201 A JST#3.A JST#3-Dares.A JST#3-Dares
4 KA0034.10.110201.LOV A JST#3.A JST#3-Dares.A JST#3-Dares.VOM
5 KA0034.10.110201.LOV.MAX A JST#3.A JST#3-Dares.A JST#3-Dares.VOM.A JST#3-Dares
6 KA0034.FN A JST#3.Some Invent
7 KA0034.FN.010XYZ A JST#3.Some Invent.Some Invent
8 KA0034.FN.010XYZ.LEX A JST#3.Some Invent.Some Invent.Some Invent
9 KA0034.FN.010XYZ.LEX.NAT A JST#3.Some Invent.Some Invent.Some Invent.A JST#3
Here the suffix (part after last period) gets concatenated to prefix string's values.
For example:
KA0034.10.110201.LOV
isVOM
(LOV is the suffix)- But the
KA0034.10.110201
isJST#3-Dares
(110201 is the suffix) - likewise
KA0034.10
isJST#3-Dares
(10 is the suffix) - and
KA0034
isA JST#3
so resultant string is A JST#3.A JST#3-Dares.A JST#3-Dares.VOM