0

I have a case where the variable names in my dataframes contain information about more than one variable. For example, "cs_ta_p50". I used melt to melt the data. So now I have

|variable    value |
|cs_ta_p50    ...  |

To fix this I need to create a variable ''type'' and ''dec''

I try to do this by:

cbind(mdata, colsplit(mdata$variable,"(\\_p50)", names=c("type","dec")))

But this results in

    |variable    value   type     dec |
    |cs_ta_p50    ...   cs_ta      NA |

when I really need

|variable    value   type     dec |
|cs_ta_p50    ...   cs_ta      p50|

I guess this has to do with the regular expression being wrong, so what do I do?

Oscar
  • 41
  • 2
  • 9
  • 1
    Drop all these old packages for reshaping data and use [‹tidyr›](https://blog.rstudio.org/2014/07/22/introducing-tidyr/) instead. It learned from its predecessors’ mistakes and does everything much cleaner. The actual operation would then be `extract(mdata, variable, c('type', 'dec'), '^(.+)_([^_]+)$')`. – Konrad Rudolph Aug 03 '16 at 13:48

2 Answers2

4

with data.table::tstrsplit you can do it in two lines:

# data
require(data.table)
dt <- data.table(variable = c("cs_ta_p50", "cs_df_p60", "cs_jk_p67"),
                 value = c(1,2,3))

# solution
dt[, c('prefix', 'type', 'dec') := tstrsplit(variable, '_')]
dt[, type := paste(prefix, type, sep = '_')]

EDIT

thanks @MichaelChirico, good stuff. So the complete solution then is

dt[, c('type', 'dec') := tstrsplit(variable, '_(?=[^_]*$)', perl = TRUE)]
sbstn
  • 628
  • 3
  • 7
0

It's a little janky but this should work!

library(tidyr)

df <- data.frame(variable = c("cs_ta_p50", "cs_df_p60", "cs_jk_p67"))

df_new <- df %>%
    mutate(x = variable) %>%
    separate(x, into = c("type1", "type2", "dec"), sep = c("\\_")) %>%
    mutate(type = paste0(type1, "_", type2)) %>%
    select(variable, type, dec)

df_new

Output:

   variable  type dec
1 cs_ta_p50 cs_ta p50
2 cs_df_p60 cs_df p60
3 cs_jk_p67 cs_jk p67
emehex
  • 9,874
  • 10
  • 54
  • 100