3

I have a category column that is separated by ";". I.E Value:

value <- "A > B > C; A > B > D; A > B > C > C1"

It means:

The current product belongs to category "A > B > C", to category "A > B > D" and to category "A > B > C > C1"

If a category is already contained in another, this should be removed. So the goal is:

expectedResult <- "A > B > D; A > B > C > C1"

because "A > B > C > C1" is containing "A > B > C".

How can I solve this?

Note: I know that there are hundreds of questions that seem similar. But I just couldn't find a solution.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Axel K
  • 191
  • 8

2 Answers2

1

This ought to work:


value <- "A > B > C; A > B > D; A > B > C > C1"
els <- strsplit( value, "; " )[[1]]

my_reducer  <- function(a,b) {
    v <- str_detect( b, fixed(a) )
    a <- a[!v]
    append(a,b)
}

paste( Reduce( my_reducer, els ), collapse="; " )

Output:


> Reduce( my_reducer, els )
[1] "A > B > D; A > B > C > C1"

Sirius
  • 5,224
  • 2
  • 14
  • 21
  • i like the way to use Reduce. how would I have to adapt it if the categories had a prefix (id) this should be disregarded, but it is present in the result? For example: value <- "c090501221> A> B> C; c090601221> A> B> D; c090501222> A> B> C> C1" result = "c090601221> A> B> D; c090501222> A> B> C> C1" – Axel K May 04 '21 at 09:34
  • 1
    i believe i got it: my_reducer <- function(a,b) { v <- str_detect( b, fixed(str_sub(a, start = 6, end = nchar(a)))) a <- a[!v] append(a,b) } – Axel K May 04 '21 at 09:41
0

Perhaps you can try the code below

v <- unlist(strsplit(value, ";\\s+"))
idx <- colSums(`diag<-`(sapply(v, function(x) {
  p <- gsub(x, "", v, fix = TRUE)
  p != v & nchar(p) > 0
}), FALSE)) == 0
paste0(names(idx)[idx], collapse = "; ")

which gives

[1] "A > B > D; A > B > C > C1"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81