2

I try to reshape the following

country region abc2001 abc2002 xyz2001 xyz2002
Japan East Asia 1 2 4.5 5.5

to the following

country region year abc xyz
Japan East Asia 2001 1 4.5
Japan East Asia 2002 2 5.5

actually there are five more variables in the same way.

I use the following code:

long <- data %>% pivot_longer(cols = c(-country, -region), names_to = c(".value", "year"), names_pattern = "([^\\.]*)\\.*(\\d{4})")

The result is long version of the data except that I lose country and region variables. What do I do wrong? Or how else can I do this better?

Thank you in advance.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39

2 Answers2

1

We may change the regex pattern to match one or more non-digits(\\D+) as the first capture group and one or more digits (\\d+) as the second one

librarytidyr)
pivot_longer(data, cols = c(-country, -region),
    names_to = c(".value", "year"), names_pattern = "(\\D+)(\\d+)")

-output

# A tibble: 2 × 5
  country region    year    abc   xyz
  <chr>   <chr>     <chr> <int> <dbl>
1 Japan   East Asia 2001      1   4.5
2 Japan   East Asia 2002      2   5.5

data

data <- structure(list(country = "Japan", region = "East Asia", abc2001 = 1L, 
    abc2002 = 2L, xyz2001 = 4.5, xyz2002 = 5.5), 
class = "data.frame", row.names = c(NA, 
-1L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Update: see comments as @akrun noted, here is better regex with lookaround:

rename_with(., ~str_replace(names(data), "(?<=\\D)(?=\\d)", "\\_"))

First answer:

Here is a version with names_sep. The challenge was to add an underscore in the column names. The preferred answer is that of @akrun:

  • (.*) - Group 1: any zero or more chars as many as possible
  • (\\d{4}$) - Group 2: for digits at the end
library(dplyr)
library(tidyr)

data %>% 
  rename_with(., ~sub("(.*)(\\d{4}$)", "\\1_\\2", names(data))) %>% 
  pivot_longer(-c(country, region),
             names_to =c(".value","Year"),
             names_sep ="_"
             )
  country region    Year    abc   xyz
  <chr>   <chr>     <chr> <int> <dbl>
1 Japan   East Asia 2001      1   4.5
2 Japan   East Asia 2002      2   5.5
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 2
    You can still do this wihtout adding a delimiter i.e. `(?<=\\D)(?=\\d)` split between the non-digit and digit using regex lookaround – akrun Apr 03 '22 at 15:44