Split column by separator and delete values contained in other values

Question

I have a category column that is separated by ";". I.E Value:

value <- "A > B > C; A > B > D; A > B > C > C1"

It means:

The current product belongs to category "A > B > C", to category "A > B > D" and to category "A > B > C > C1"

If a category is already contained in another, this should be removed. So the goal is:

expectedResult <- "A > B > D; A > B > C > C1"

because "A > B > C > C1" is containing "A > B > C".

How can I solve this?

Note: I know that there are hundreds of questions that seem similar. But I just couldn't find a solution.

score 1 · Answer 1 · answered May 04 '21 at 08:56

1

This ought to work:


value <- "A > B > C; A > B > D; A > B > C > C1"
els <- strsplit( value, "; " )[[1]]

my_reducer  <- function(a,b) {
    v <- str_detect( b, fixed(a) )
    a <- a[!v]
    append(a,b)
}

paste( Reduce( my_reducer, els ), collapse="; " )

Output:


> Reduce( my_reducer, els )
[1] "A > B > D; A > B > C > C1"

answered May 04 '21 at 08:56

Sirius

5,224
2
14
21

i like the way to use Reduce. how would I have to adapt it if the categories had a prefix (id) this should be disregarded, but it is present in the result? For example: value <- "c090501221> A> B> C; c090601221> A> B> D; c090501222> A> B> C> C1" result = "c090601221> A> B> D; c090501222> A> B> C> C1" – Axel K May 04 '21 at 09:34
1

i believe i got it: my_reducer <- function(a,b) { v <- str_detect( b, fixed(str_sub(a, start = 6, end = nchar(a)))) a <- a[!v] append(a,b) } – Axel K May 04 '21 at 09:41

score 0 · Answer 2 · answered May 04 '21 at 08:59

Perhaps you can try the code below

v <- unlist(strsplit(value, ";\\s+"))
idx <- colSums(`diag<-`(sapply(v, function(x) {
  p <- gsub(x, "", v, fix = TRUE)
  p != v & nchar(p) > 0
}), FALSE)) == 0
paste0(names(idx)[idx], collapse = "; ")

which gives

[1] "A > B > D; A > B > C > C1"

Split column by separator and delete values contained in other values

2 Answers2