Get a vector which contains the string before one specific character from each element in a dataframe column

Question

I have a dataframe that looks like this:

df = data.frame(V1= c(1:3), V2 = c("abc-1-10", "def-2-19", "ghi-3-937"))

I would like to be able to add a column V3 to this dataframe such that the column V3 contains the string before the first hyphen ("-") in column V2. So it should look like this:

dfDesired = data.frame(V1= c(1:3), V2 = c("abc-1-10", "def-2-19", "ghi-3-937"), V3 = c("abc", "def","ghi"))

When I try using strsplit, it gives me a list of vectors with the three parts of the each string.

Maël · Answer 1 · 2022-04-14T12:45:16.507

2

A non-regex friendly way is using tidyr::separate:

tidyr::separate(df, V2, into = "V3", extra = "drop", remove = F, sep = "-")

  V1        V2  V3
1  1  abc-1-10 abc
2  2  def-2-19 def
3  3 ghi-3-937 ghi

edited Apr 14 '22 at 12:45

answered Apr 14 '22 at 12:34

Maël

45,206
3
29
67

This is a good solution. However, if the string before the first hyphen has underscores (mine do), it does not work as it also removes everything after the underscore – ag14 Apr 14 '22 at 12:45
1

See edit. You can modify the sep argument. – Maël Apr 14 '22 at 12:45
Lovely! This solves the problem – ag14 Apr 14 '22 at 12:48

score 2 · Answer 2 · answered Apr 14 '22 at 12:40

2

You can drop everything after the first -

In base R,

transform(df, V3 = sub('-.*', '', V2))

#  V1        V2  V3
#1  1  abc-1-10 abc
#2  2  def-2-19 def
#3  3 ghi-3-937 ghi

answered Apr 14 '22 at 12:40

Ronak Shah

377,200
20
156
213

Also a great solution if one is comfortable using regex. – ag14 Apr 14 '22 at 12:49

Get a vector which contains the string before one specific character from each element in a dataframe column

2 Answers2