3

I have a dataframe that looks like this. I need to create a new column of values using the strings in PROJ_ID column and form the strings of values from the PROJ_NAME column.

The solutions provided here (Accumulate values from one column based on keys of another column in r) take a long time to run (and do not produce correct output with case shown below), so I used @r2evans recommendation.

                   PROJ_ID     PROJ_NAME
1                   KA0034       A JST#3
2                KA0034.10 A JST#3-Dares
3         KA0034.10.110201 A JST#3-Dares
4     KA0034.10.110201.LOV           VOM
5 KA0034.10.110201.LOV.MAX A JST#3-Dares
6                KA0034.FN   Some Invent
7         KA0034.FN.010XYZ   Some Invent
8     KA0034.FN.010XYZ.LEX   Some Invent
9 KA0034.FN.010XYZ.LEX.NAT       A JST#3
input <- data.frame(
  
  PROJ_ID = c("KA0034",
              "KA0034.10",
              "KA0034.10.110201",
              "KA0034.10.110201.LOV",
              "KA0034.10.110201.LOV.MAX",
              "KA0034.FN",
              "KA0034.FN.010XYZ",
              "KA0034.FN.010XYZ.LEX",
              "KA0034.FN.010XYZ.LEX.NAT"),
  
  PROJ_NAME = c("A JST#3",
                "A JST#3-Dares",
                "A JST#3-Dares",
                "VOM",
                "A JST#3-Dares",
                "Some Invent",
                "Some Invent",
                "Some Invent",
                "A JST#3")
)


fun <- function(st) strcapture("(.*)[.][^.]+$", st, list(L=""))$L



input <- input %>%
  mutate(K = fun(PROJ_ID))



while (TRUE) {
  input <- left_join(input, select(input, PROJ_ID, iss = PROJ_NAME), by = c("K" = "PROJ_ID")) %>%
    mutate(
      PROJ_NAME = if_else(is.na(iss), PROJ_NAME,
                          if_else(PROJ_ID == K, PROJ_NAME, paste(iss, PROJ_NAME, sep = "."))),
      K = fun(K)) %>%
    select(-iss)
  if (all(is.na(input$K))) break
}



input$K <- NULL



# Update the PROJ_NAME column to include repeated parts for each unique PROJ_NAME
input$PROJ_NAME <- sapply(strsplit(as.character(input$PROJ_NAME), "\\."), function(x) {
  unique_parts <- unique(x)
  paste(unique_parts, collapse = ".")
})



# Print the updated dataframe
print(input)

Output:

                   PROJ_ID                 PROJ_NAME
1                   KA0034                   A JST#3
2                KA0034.10     A JST#3.A JST#3-Dares
3         KA0034.10.110201     A JST#3.A JST#3-Dares
4     KA0034.10.110201.LOV A JST#3.A JST#3-Dares.VOM
5 KA0034.10.110201.LOV.MAX A JST#3.A JST#3-Dares.VOM
6                KA0034.FN       A JST#3.Some Invent
7         KA0034.FN.010XYZ       A JST#3.Some Invent
8     KA0034.FN.010XYZ.LEX       A JST#3.Some Invent
9 KA0034.FN.010XYZ.LEX.NAT       A JST#3.Some Invent

Output needed:

                   PROJ_ID                                             PROJ_NAME
1                   KA0034                                               A JST#3
2                KA0034.10                                 A JST#3.A JST#3-Dares
3         KA0034.10.110201                   A JST#3.A JST#3-Dares.A JST#3-Dares
4     KA0034.10.110201.LOV               A JST#3.A JST#3-Dares.A JST#3-Dares.VOM
5 KA0034.10.110201.LOV.MAX A JST#3.A JST#3-Dares.A JST#3-Dares.VOM.A JST#3-Dares
6                KA0034.FN                                   A JST#3.Some Invent
7         KA0034.FN.010XYZ                       A JST#3.Some Invent.Some Invent
8     KA0034.FN.010XYZ.LEX           A JST#3.Some Invent.Some Invent.Some Invent
9 KA0034.FN.010XYZ.LEX.NAT   A JST#3.Some Invent.Some Invent.Some Invent.A JST#3

Here the suffix (part after last period) gets concatenated to prefix string's values.

For example:

  1. KA0034.10.110201.LOV is VOM (LOV is the suffix)
  2. But the KA0034.10.110201 is JST#3-Dares (110201 is the suffix)
  3. likewise KA0034.10 is JST#3-Dares (10 is the suffix)
  4. and KA0034 is A JST#3

so resultant string is A JST#3.A JST#3-Dares.A JST#3-Dares.VOM

CaseebRamos
  • 684
  • 3
  • 18

1 Answers1

1

The issue is your iss is updated in every iteration. Your should save the original copy of PROJ_NAME to paste().

i.e.

input <- input %>%
  mutate(K = fun(PROJ_ID),
         saved_proj_name=PROJ_NAME) 


while (TRUE) {
  input <- left_join(input, select(input, PROJ_ID, iss = saved_proj_name), by = c("K" = "PROJ_ID")) %>%
    mutate(
      PROJ_NAME = if_else(is.na(iss), PROJ_NAME,
                          paste(iss, PROJ_NAME, sep = ".")),
      K = fun(K)) %>%
    select(-iss)
  if (all(is.na(input$K))) break
}

print(input)
                   PROJ_ID                                             PROJ_NAME    K saved_proj_name
1                   KA0034                                               A JST#3 <NA>         A JST#3
2                KA0034.10                                 A JST#3.A JST#3-Dares <NA>   A JST#3-Dares
3         KA0034.10.110201                   A JST#3.A JST#3-Dares.A JST#3-Dares <NA>   A JST#3-Dares
4     KA0034.10.110201.LOV               A JST#3.A JST#3-Dares.A JST#3-Dares.VOM <NA>             VOM
5 KA0034.10.110201.LOV.MAX A JST#3.A JST#3-Dares.A JST#3-Dares.VOM.A JST#3-Dares <NA>   A JST#3-Dares
6                KA0034.FN                                   A JST#3.Some Invent <NA>     Some Invent
7         KA0034.FN.010XYZ                       A JST#3.Some Invent.Some Invent <NA>     Some Invent
8     KA0034.FN.010XYZ.LEX           A JST#3.Some Invent.Some Invent.Some Invent <NA>     Some Invent
9 KA0034.FN.010XYZ.LEX.NAT   A JST#3.Some Invent.Some Invent.Some Invent.A JST#3 <NA>         A JST#3
one
  • 3,121
  • 1
  • 4
  • 24