Column Split based on certain separator format

Question

I have a format to separate where I will have this data:

df = data.frame(id=c(1,2),name=c('A~B~C','A~B~D'),value=c('1~2~3','1~~2'))
id   name      value
1    A~B~C     1~2~3
2    A~B~D     1~~2

which is expected to have the following output where the column name is the original column name followed by the text in the name column:

id   value_A   value_B   value_C  value_D
1      1         2          3      
2      1                            2

I manage to achieve the splitting for the by using many nested for loops to process on my data row by row. It works on small sample data but once the data gets huge, the time is an issue.

Also,there could be more than 1 value columns, but they all should map into the same name column. Example output:

id   value_A   value_B   value_C  value1_A   value1_B   value1_C
1      1         2          3       1           2          3
2      1         2          3       1           2          3

Roman · Answer 1 · 2017-08-07T15:29:43.353

2

You can try dplyr:

library(tidyverse)
df %>% 
 separate_rows(name, value, sep = "~") %>% 
 spread(name, value)
  id A B    C    D
1  1 1 2    3 <NA>
2  2 1   <NA>    2

Instead of NA you can fill empty cells by anything you specify within fill = ""

Or baseR and reshape2:

a <- strsplit(as.character(df$name), "~")
b <- strsplit(as.character(df$value), "~")
df2 <- do.call(rbind.data.frame, Map(cbind, df$id, a, b))
library(reshape2)
dcast(df2, V1~V2, value.var = "V3")
  A B C    D   
1 1 2 3    <NA>
2 1   <NA> 2

edited Aug 07 '17 at 15:29

answered Aug 07 '17 at 11:19

Roman

17,008
3
36
49

Is there any way to append value_ (the original column name) into each of the columns? – kaexch Aug 11 '17 at 03:07
tray to add this in the first example `select(id, value_A = A, value_B = B, value_C = C, value_D = D)` or rename the resulting data.frame (saved into variable `df1` for instance) using `colnames(df1)[-1] <- paste0("value_", colnames(df1)[-1])` – Roman Aug 11 '17 at 09:01

score 0 · Accepted Answer · answered Aug 07 '17 at 11:28

0

Here is an option using cSplit/dcast. Split the rows into 'long' format with cSplit and dcast it to 'wide' format

library(splitstackshape)
dcast(cSplit(df, c('name','value'), '~', 'long')[!is.na(value)], id ~ paste0('value_', name))
#   id value_A value_B value_C value_D
#1:  1       1       2       3      NA
#2:  2       1      NA      NA       2

answered Aug 07 '17 at 11:28

akrun

874,273
37
540
662

If lets say I have a value1 and value2 columns. What changes can I make to the`dcast` to make it able to cast the values? I know I can use `cSplit(df,c('name','value1','value2'),'~','long')` to split the 3 columns. – kaexch Aug 11 '17 at 00:56

Column Split based on certain separator format

2 Answers2