0

I am working with a database that should be separated by several delimiters. The most common are semicolons and a point followed by a slash: './'.

How do I complete the code in order to apply both delimiters?

library(tidyverse)
library(splitstackshape)

values <- c("cat; dog; mouse", "cat ./ dog ./ mouse")
data <- data.frame(cbind(values))

separated <- cSplit(data.frame(data), "values", sep = ";", drop = TRUE)

I tried a vector solution but without much success.

onlyjust17
  • 125
  • 5

1 Answers1

1

I'm not exactly sure what your final output structure should be, but one approach could be to start with tidy::separate which would put all of your animals in a separate column:

df <- tidyr::separate(data, col = values, 
                into = c("Animal1", "Animal2", "Animal3"), 
                sep = c(";|./"))

#. Animal1 Animal2 Animal3
#1     cat     dog   mouse
#2     cat     dog   mouse

Without a pre-defined number of elements in each string, you could also try:

# Add in a third value to data with only 2 animals
values <- c("cat; dog; mouse", "cat ./ dog ./ mouse", "frog; squirrel")
data <- data.frame(cbind(values))


data_clean <- gsub(";|./", ";", data$values)
separated <- splitstackshape::cSplit(data.frame(values = data_clean), 
                                     "values", sep = ";", drop = TRUE)

#    values_1 values_2 values_3
# 1:      cat      dog    mouse
# 2:      cat      dog    mouse
# 3:     frog squirrel     <NA>
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • This seems like a wonderful solution, but what if I don't know how many elements are in the cell I want to split? I am working with a huge data set and the cell with the most elements should determine the number of columns. If the other rows don't have that many elements, then the code should put NAs in the remaining cells. – onlyjust17 Sep 26 '22 at 12:28
  • @onlyjust17 Does the edit do the trick? (I added in a third part to your data and I tried to stick to your original code as much as possible) – jpsmith Sep 26 '22 at 12:53
  • It's a great idea and works very well, but now it turns out that both delimeters are found in some elements. I think I can handle this problem. Thank you very much for your help. – onlyjust17 Sep 26 '22 at 16:10