I have a large data set a small sample of which looks like the 4 x 5 tibble below. I'm trying to split multiple delimited columns into unique rows using variable c=="Split"
as below:
library(splitstackshape)
dt <- tibble(
a = c("Quartz | White Spirit | Wildfire", "Quiet Riot", "Race Against Time", "Down | Heart Lane | X | Breaking H"),
b = c("Muthas Pride", "Killer Girls / Slick Black Cadillac", "Demo 1980", "Life 55"),
c = c("Split", "Single", "Demo", "Split"),
d = c("Birmingham, England | Hartlepool, England | Sheffield, South Yorkshire, England", "Los Angeles, California", "Nottingham, England", "Liverpool | Beijing | | NYC"),
e = c("wf | ef | ff", "g", "f", "cf | af | df | rf")
)
dt.s <- subset(dt, c == "Split")
dt.split <- cSplit(dt.s, c("a", "d", "e"), c("|", "|", "|"), "long")
dt.split
However, this coerces an extra row of NAs as seen in row 4:
a b c d e
1: Quartz Muthas Pride Split Birmingham, England wf
2: White Spirit Muthas Pride Split Hartlepool, England ef
3: Wildfire Muthas Pride Split Sheffield, South Yorkshire, England ff
4: NA Muthas Pride Split NA NA
5: Down Life 55 Split Liverpool cf
6: Heart Lane Life 55 Split Beijing af
7: X Life 55 Split df
8: Breaking H Life 55 Split NYC rf
This is not a problem if I split only two columns. How do I get it to not produce the NA row? And, is there a way to make cSplit
work without subsetting by c
?