0

Given a dataframe, I would like to use strsplit on one of my columns, and return the first element of the vector. Here is the example:

testdf<- data.frame(col1= c('string1.string2', 'string3.string4'),
                    col2= c('somevalue', 'someothervalue'),
                   stringsAsFactors = FALSE)

I want to generate a new column such as testdf$col3 <- c('string1', 'string3')

I tried the following:

testdf$col3<- strsplit(testdf$col1, split = '\\.')[[1]])[1]

which, of course, doesn't work. It returns just the first element of the output ('string1') and writes it for the whole column. One solution would be to write a custom function:

customfx<- function(ind_cell){
 my_out<- strsplit(ind_cell, split = '\\.')[[1]][1]
 return(my_out)}

Then use it with sapply. I was wondering if there is an alternative to this. The talking stick is yours :)

Max_IT
  • 602
  • 5
  • 15

1 Answers1

2

You can use sub (which is vectorized) with regex for this:

testdf$col3 <- sub("^([^.]+).*", "\\1", testdf$col1)

testdf
#             col1           col2    col3
#1 string1.string2      somevalue string1
#2 string3.string4 someothervalue string3

Here use ^([^.]+).* to match the whole string and capture the substring from the beginning until a dot is met, then replace the whole string with the captured group using back reference.

Psidom
  • 209,562
  • 33
  • 339
  • 356