1

I am looking for the most efficient form to transform

  ARTNR           FILGRP
1     1             9827
2     2             9348
3     3 9335, 9827, 9339

into this

  ARTNR      FILGRP
1     1      9827
2     2      9348
3     3      9335
4     3      9827
5     3      9339

I tried the following code and it works, but it is not elegant and has some shortcomings. :

setDT(artnrs)  
artnrs[, c("P1", "P2", "P3") := tstrsplit(FILGRP, ",", fixed=TRUE)] # 1)
artnrs <- melt(artnrs, c("ARTNR"), measure = patterns("^P")) # 2)
artnrs[,variable:=NULL] # 3)
artnrs <- na.omit(artnrs, cols="value") # 4)
names(artnrs)[2] <- "FILGRP" # 5)
  • ad 1) splits the last column in three new ones. How can I make this dynamic and make it fit for five or ten?
  • ad 2-5) rather clumpsy operations, could I chain this better?

It is based on data.table but performance is not that critical so an easy to understand tidyverse solution would be ok. But the fewer packages, the better.

Thanks!

dput output;

structure(list(ARTNR = c(1, 2, 3), FILGRP = c("9827", "9348", "9335, 9827, 9339")), 
row.names = c(NA, -3L), class = "data.frame")
mullet
  • 15
  • 7
  • You'd want to use an `unlist` in there (not using `:=`). Or you can use `cSplit` from my "splitstackshape" package. Or you can use `separate_rows` from "tidyr". – A5C1D2H2I1M1N2O1R2T1 Jan 28 '21 at 19:02
  • 3
    Just to elaborate you can try any of the following `artnrs[, list(FILGRP = unlist(tstrsplit(FILGRP, ",", type.convert = TRUE))), ARTNR]`, or `cSplit(artnrs, "FILGRP", direction = "long")`, or `separate_rows(artnrs, FILGRP)`. – A5C1D2H2I1M1N2O1R2T1 Jan 28 '21 at 19:07

2 Answers2

1
df <- structure(list(ARTNR = c(1, 2, 3), FILGRP = c("9827", "9348", "9335, 9827, 9339")), 
          row.names = c(NA, -3L), class = "data.frame")

df2 <- strsplit(df$FILGRP, split = ",")
df2 <- data.frame(ARTNR = rep(df$ARTNR, sapply(df2, length)), FILGRP = unlist(df2))
neuron
  • 1,949
  • 1
  • 15
  • 30
0

here is a data.table approach

library( data.table )
setDT(DT)

melt( DT[, paste0( "v", 1:length(tstrsplit( DT$FILGRP, ", ") ) ) := tstrsplit( FILGRP, ", ") ],
      id.vars = "ARTNR", 
      measure.vars = patterns( "^v" ),
      value.name = "FILGRP" )[!is.na(FILGRP), .SD, .SDcols = c(1,3) ]


#    ARTNR FILGRP
# 1:     1   9827
# 2:     2   9348
# 3:     3   9335
# 4:     3   9827
# 5:     3   9339
Wimpel
  • 26,031
  • 1
  • 20
  • 37