Make new column that contains part of expression in another column

Question

I have a data frame like so:

 before<- data.frame( Var1= 
  c("174_1","174_1","174_2","174_3","175_1","175_1"))

I would like to add another column Var2 that contains the part of the expression in Var1 before the underscore. The new column would appear as follows:

after<- data.frame( Var1= 
  c("174_1","174_1","174_2","174_3","175_1","175_1"), Var2= 
  c("174","174","174","174","175","175"))

I am believe functions like grepl() could be useful for this, however, I do not know how to specify keeping part of an before the grepl("_").

Either this question - https://stackoverflow.com/questions/10617702/remove-part-of-string-after - or one of the linked duplicates should solve your issue. — thelatemail, Jan 31 '18 at 22:08

3pitt · Accepted Answer · 2018-02-02T10:25:07.800

1

df1$b <- substr(df1$a, 1, regexpr('_', df1$a)[1]-1)

This takes a substring of everything up until the underscore

edited Feb 02 '18 at 10:25

answered Jan 31 '18 at 22:08

3pitt

899
13
21

1

`substr` and `regexpr` are vectorised, so you can just do: `substr(before$Var1, 1, regexpr('_', before$Var1)[1]-1)` – thelatemail Jan 31 '18 at 22:10
And if I wanted every thing after the underscore how would that look? – Danielle Feb 02 '18 at 03:47
@Danielle Two changes - add to the index of the search result rather than subtract from it, And use the regex result as the lower bound of the substring rather than the upper. The following should work: `substr(df1$a,regexpr('_', df1$a)[1]+1,nchar(as.character(df1$a)))` – 3pitt Feb 02 '18 at 10:20

score 1 · Answer 2 · answered Jan 31 '18 at 22:08

Use tidyr::separate:

d = data.frame(Var1 = c("174_1","174_1","174_2","174_3","175_1","175_1"))
temp = tidyr::separate(d, Var1, into=c("v1", "v2"), sep="_")
temp
   v1 v2
1 174  1
2 174  1
3 174  2
4 174  3
5 175  1
6 175  1
d[["Var2"]] <- temp[["v1"]]

score 1 · Answer 3 · answered Jan 31 '18 at 22:22

1

before <- data.frame(Var1= c("174_1","174_1","174_2","174_3","175_1","175_1"))

after <- data.frame(Var1 = before$Var1,Var2 = unlist(lapply(strsplit(as.character(before$Var1), '_'), `[[`,1)))

answered Jan 31 '18 at 22:22

Albert Simmons

156
5

Make new column that contains part of expression in another column

3 Answers3