1

I have a data frame like so:

 before<- data.frame( Var1= 
  c("174_1","174_1","174_2","174_3","175_1","175_1"))

I would like to add another column Var2 that contains the part of the expression in Var1 before the underscore. The new column would appear as follows:

after<- data.frame( Var1= 
  c("174_1","174_1","174_2","174_3","175_1","175_1"), Var2= 
  c("174","174","174","174","175","175"))

I am believe functions like grepl() could be useful for this, however, I do not know how to specify keeping part of an before the grepl("_").

Danielle
  • 785
  • 7
  • 15

3 Answers3

1
df1$b <- substr(df1$a, 1, regexpr('_', df1$a)[1]-1)

This takes a substring of everything up until the underscore

3pitt
  • 899
  • 13
  • 21
  • 1
    `substr` and `regexpr` are vectorised, so you can just do: `substr(before$Var1, 1, regexpr('_', before$Var1)[1]-1)` – thelatemail Jan 31 '18 at 22:10
  • And if I wanted every thing after the underscore how would that look? – Danielle Feb 02 '18 at 03:47
  • @Danielle Two changes - add to the index of the search result rather than subtract from it, And use the regex result as the lower bound of the substring rather than the upper. The following should work: `substr(df1$a,regexpr('_', df1$a)[1]+1,nchar(as.character(df1$a)))` – 3pitt Feb 02 '18 at 10:20
1

Use tidyr::separate:

d = data.frame(Var1 = c("174_1","174_1","174_2","174_3","175_1","175_1"))
temp = tidyr::separate(d, Var1, into=c("v1", "v2"), sep="_")
temp
   v1 v2
1 174  1
2 174  1
3 174  2
4 174  3
5 175  1
6 175  1
d[["Var2"]] <- temp[["v1"]]
kgolyaev
  • 565
  • 2
  • 10
1
before <- data.frame(Var1= c("174_1","174_1","174_2","174_3","175_1","175_1"))

after <- data.frame(Var1 = before$Var1,Var2 = unlist(lapply(strsplit(as.character(before$Var1), '_'), `[[`,1)))