0

I have a dataframe in long table format like this one below:

mydf <- data.frame(id = c(123, 123, 123),
                   name =c("test_2001", "test_area_2002", "test_area_sqkm_2003"),
                   value = c(15, 20, 25))
mydf

#>     id                 name  value
#> 1  123            test_2001     15
#> 2  123       test_area_2002     20
#> 3  123  test_area_sqkm_2003     25

After taking reference from this question here, I applied function separate from package tidyr:

library(dplyr)
library(tidyr)

mydf %>% 
  separate(name, c("name","year"), extra="merge", fill = "left")

#>     id  name            year  value
#> 1  123  test            2001     15
#> 2  123  test       area_2002     20
#> 3  123  test  area_sqkm_2003     25

But I couldn't figure out how to split characters and numbers into two different columns like this:

desired.df

#>      id           name  year value
#> 1   123           test  2001    15
#> 2   123      test_area  2002    20
#> 3   123 test_area_sqkm  2003    25

Created on 2021-12-15 by the reprex package (v2.0.0)

  • 1
    Just set `sep="_(?=[^_]+$)"`, as answered in [this post](https://stackoverflow.com/questions/50518137/separate-a-column-into-2-columns-at-the-last-underscore-in-r) because you essentially want to split on last `_`. – caldwellst Dec 15 '21 at 09:59

2 Answers2

1

Is something like the following you are looking for?

library(tidyverse)

mydf <- data.frame(id = c(123, 123, 123),
                   name =c("test_2001", "test_area_2002", "test_area_sqkm_2003"),
                   value = c(15, 20, 25))

mydf %>% 
  separate(name, sep="_(?=\\d+$)", into = c("name", "year"))

#>    id           name year value
#> 1 123           test 2001    15
#> 2 123      test_area 2002    20
#> 3 123 test_area_sqkm 2003    25
PaulS
  • 21,159
  • 2
  • 9
  • 26
1

As a bit of an explanation (?<=[a-z])_(?=[1-9]) matches an _, then looks forward for a digit, (?=[1-9]) and looks back for a letter, (?<=[a-z]), since that's what we want to split the string on.

library(tidyr)
library(magrittr)
df %>% 
    separate(name, sep="(?<=[a-z])_(?=[1-9])", into=c("name", "year"))
   id           name year value
1 123           test 2001    15
2 123      test_area 2002    20
3 123 test_area_sqkm 2003    25
user438383
  • 5,716
  • 8
  • 28
  • 43