Extracting a string from one column into another in R

Question

I have an example data frame like the one below.

ID	File
1	11_213.csv
2	13_256.csv
3	11_223.csv
4	12_389.csv
5	14_456.csv
6	12_345.csv

And I want to add another column based on the string between the underscore and the period to get a data frame that looks something like this.

ID	File	Group
1	11_213.csv	213
2	13_256.csv	256
3	11_223.csv	223
4	12_389.csv	389
5	14_456.csv	456
6	12_345.csv	345

I think I need to use the str_extract feature within stringr but I am not sure what notation to use for my pattern. For example when I use:

df <- df %>%
mutate("Group" = str_extract(File, "[^_]+"))

I get the all the information before the underscore like this:

ID	File	Group
1	11_213.csv	11
2	13_256.csv	13
3	11_223.csv	11
4	12_389.csv	12
5	14_456.csv	14
6	12_345.csv	12

But that is not what I want. What should I use instead of "[^_]+" to get just the stuff between the underscore and the period? Thanks!

You need `str_extract(File, "(?<=_)(\\d+)(?=\\.)")` – akrun Mar 08 '21 at 17:14 — akrun, Mar 08 '21 at 17:14

score 7 · Accepted Answer · answered Mar 08 '21 at 17:21

We can use a regex lookaround to extract the digits (\\d+) that succeeds a _ and precedes a . with str_extract

library(dplyr)
library(stringr)
df <- df %>%
    mutate(Group = str_extract(File, "(?<=_)(\\d+)(?=\\.)")

Or another option is to remove the substring with str_remove i.e to match characters (.*) including the _ or (|) characters from . onwards (. can match any character in regex mode - which is by default, so we escape \\ it for literal matching)

df <- df %>%
        mutate(Group = str_remove_all(File, ".*_|\\..*"))

score 3 · Answer 2 · answered Mar 08 '21 at 22:13

3

A base R option using gsub

transform(
  df,
  Group = gsub(".*_(\\d+)\\..*", "\\1", File)
)

gives

  ID       File Group
1  1 11_213.csv   213
2  2 13_256.csv   256
3  3 11_223.csv   223
4  4 12_389.csv   389
5  5 14_456.csv   456
6  6 12_345.csv   345

answered Mar 08 '21 at 22:13

ThomasIsCoding

96,636
9
24
81

Extracting a string from one column into another in R

2 Answers2