I lose the constant variables (including id) when using pivot_longer with multiple variables

Question

I try to reshape the following

country	region	abc2001	abc2002	xyz2001	xyz2002
Japan	East Asia	1	2	4.5	5.5

to the following

country	region	year	abc	xyz
Japan	East Asia	2001	1	4.5
Japan	East Asia	2002	2	5.5

actually there are five more variables in the same way.

I use the following code:

long <- data %>% pivot_longer(cols = c(-country, -region), names_to = c(".value", "year"), names_pattern = "([^\\.]*)\\.*(\\d{4})")

The result is long version of the data except that I lose country and region variables. What do I do wrong? Or how else can I do this better?

Thank you in advance.

score 1 · Accepted Answer · answered Apr 03 '22 at 14:57

We may change the regex pattern to match one or more non-digits(\\D+) as the first capture group and one or more digits (\\d+) as the second one

librarytidyr)
pivot_longer(data, cols = c(-country, -region),
    names_to = c(".value", "year"), names_pattern = "(\\D+)(\\d+)")

-output

# A tibble: 2 × 5
  country region    year    abc   xyz
  <chr>   <chr>     <chr> <int> <dbl>
1 Japan   East Asia 2001      1   4.5
2 Japan   East Asia 2002      2   5.5

data

data <- structure(list(country = "Japan", region = "East Asia", abc2001 = 1L, 
    abc2002 = 2L, xyz2001 = 4.5, xyz2002 = 5.5), 
class = "data.frame", row.names = c(NA, 
-1L))

TarJae · Answer 2 · 2022-04-03T15:48:28.183

Update: see comments as @akrun noted, here is better regex with lookaround:

rename_with(., ~str_replace(names(data), "(?<=\\D)(?=\\d)", "\\_"))

First answer:

Here is a version with names_sep. The challenge was to add an underscore in the column names. The preferred answer is that of @akrun:

(.*) - Group 1: any zero or more chars as many as possible
(\\d{4}$) - Group 2: for digits at the end

library(dplyr)
library(tidyr)

data %>% 
  rename_with(., ~sub("(.*)(\\d{4}$)", "\\1_\\2", names(data))) %>% 
  pivot_longer(-c(country, region),
             names_to =c(".value","Year"),
             names_sep ="_"
             )

  country region    Year    abc   xyz
  <chr>   <chr>     <chr> <int> <dbl>
1 Japan   East Asia 2001      1   4.5
2 Japan   East Asia 2002      2   5.5

You can still do this wihtout adding a delimiter i.e. `(?<=\\D)(?=\\d)` split between the non-digit and digit using regex lookaround — akrun, Apr 03 '22 at 15:44

I lose the constant variables (including id) when using pivot_longer with multiple variables

2 Answers2

data