Reshape data.frame with multiple headers

Question

My data.frame looks like this:

name	country1	country1	country2	country2	country3
	code1	code1	code2	code2	code3
	sector1	sector2	sector1	sector2	sector1
country1	###	###	###	###	###
country2	###	###	###	###	###
country3	###	###	###	###	###
country4	###	###	###	###	###

where ## are numbers.

I want to reshape it like this:

name	country	code	sector	number
country1	country2	code 2	sector 1	###
country1	country2	code 2	sector 2	###
country1	country3	code 3	sector 1	###
country1	country3	code 3	sector 2	###
country2	country2	code 2	sector 2	###
country3	country1	code 1	sector 1	###
country4	country3	code 3	sector 1	###

How can I tidy up my data set? My problem is that the raw data come with this weird format in which there are multiple headers, so that I cannot simply go with pivot_longer.

Many thanks

Can you provide your data in a reproducible format using `dput` ? Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269) — Ronak Shah, Jul 26 '21 at 10:37
You can download the data from : https://worldmrio.com/unctadgvc/ . It is the third dataset available. I opened the .txt file with `read.delim` and `sep = "/t"` — nflore, Jul 26 '21 at 10:45
I made a final attempt to solve your problem. Take a look at the edit at the bottom of my answer. — Martin Gal, Jul 26 '21 at 13:16

Martin Gal · Answer 1 · 2021-07-26T12:43:06.027

0

I believe there are better ways but you can use dplyr:

data %>% 
  slice(1:2) %>% 
  mutate(across(starts_with("country"), ~ paste0(.x, collapse = "_")),
         name = "name") %>% 
  slice(1) %>% 
  mutate(across(starts_with("country"), ~ paste0(cur_column(), "_", .x))) %>% 
  {`colnames<-`(data, unlist(.))} %>% 
  slice(3:n()) %>% 
  pivot_longer(starts_with("country"), 
               names_to = c("country", "code", "sector"),
               names_pattern="(country\\d+).+(code\\d+).+(sector\\d+)") %>%
  filter(name != country)

returns

# A tibble: 15 x 5
   name     country  code  sector  value
   <chr>    <chr>    <chr> <chr>   <chr>
 1 country1 country2 code2 sector1 ###  
 2 country1 country2 code2 sector2 ###  
 3 country1 country3 code3 sector1 ###  
 4 country2 country1 code1 sector1 ###  
 5 country2 country1 code1 sector2 ###  
 6 country2 country3 code3 sector1 ###  
 7 country3 country1 code1 sector1 ###  
 8 country3 country1 code1 sector2 ###  
 9 country3 country2 code2 sector1 ###  
10 country3 country2 code2 sector2 ###  
11 country4 country1 code1 sector1 ###  
12 country4 country1 code1 sector2 ###  
13 country4 country2 code2 sector1 ###  
14 country4 country2 code2 sector2 ###  
15 country4 country3 code3 sector1 ###

Note: I filtered for name != country since in your expected output those rows with matching aren't present.

Edit

One last try:

data %>% 
  slice(1:2) %>% 
  rename(name = X) %>% 
  mutate(across(c(-name), ~ paste0(.x, collapse = "_")),
         name = "name") %>% 
  slice(1) %>% 
  mutate(across(c(-name), ~ paste0(cur_column(), "_", .x))) %>% 
  {`colnames<-`(data, unlist(.))} %>% 
  slice(3:n()) %>% 
  pivot_longer(c(-name), 
               names_to = c("country", "code", "sector"),
               names_pattern="(country\\d+).+(code\\d+).+(sector\\d+)") %>%
  filter(name != country)

edited Jul 26 '21 at 12:43

answered Jul 26 '21 at 11:26

Martin Gal

16,640
5
21
39

Thanks. Unfortunately the dataset I posted is just a stylized one. The original one has actual country names as labels ("Afghanistan, Albania..), so I can't use `starts_with` to group labels. I posted the link to the original dataset in the comments to my question. – nflore Jul 26 '21 at 11:34
Replace `starts_with("country")` by `c(-name)`. – Martin Gal Jul 26 '21 at 11:45
In this case an error occurs: `Can't subset columns that don't exist` – nflore Jul 26 '21 at 12:13
Your first column is named "name" like in your example? – Martin Gal Jul 26 '21 at 12:19
It was named "X" and I tried with `c(-data$X)`. Now it is named "name" and I tried with`c(-data$name)`. Same result – nflore Jul 26 '21 at 12:23
You don't need `data$`, since this refers to the original data.frame without any transformations. Just use `X` and `name` – Martin Gal Jul 26 '21 at 12:27
Just a hint: Your sample data should represent your original data. You can show your data by using `dput(head(data))`. Usally people (including me) on SO don't want to click on external links. – Martin Gal Jul 26 '21 at 12:29
Again an error: `Error in names(x) <- value : 'names' attribute [14841] must be the same length as the vector [14840]` – nflore Jul 26 '21 at 12:37
I know, I am sorry. I tried with `dput` but the dataset has 193 observations of 14840 variables, so the output of that function was unreadable. – nflore Jul 26 '21 at 12:39

Reshape data.frame with multiple headers

1 Answers1