-1

got a pretty basic question to ask unfortunately, I am trying to use a pivot_wider to make my data into a panel.

variable id reports gp & ge every year, the column t denotes the year. I want a separate variable gp_t and ge_t for every year in the data(i.e. t =[2011 -2013])

When I use pivot_wider, I am getting the right number of columns, but the rows are not collapsing into each other as they have in the past when I have used this function. Can someone please give me a hand? Because I am using t as the "names_from" for 2 different columns, I duplicated it... was this the wrong thing to do?

df$t2<-df$t

df<-df %>%
select(id,t,t2,gp,ge)%>%
group_by(id)%>%
pivot_wider(names_from = "t", names_prefix= "gp_" values_from = "gp")%>%
pivot_wider(names_from = "t2", names_prefix= "ge_" values_from = "ge")

NB: company 01 went out of business in 2013, this needs to be kept visible in the table if possible!

I am currently getting this:

id gp_2011 gp_2012 gp_2013 ge_2011 ge_2012 ge_2013
01 25 NA NA 12 NA NA
01 NA 32 NA NA 22 NA
02 95 NA NA 62 NA NA
02 NA 73 NA NA 41 NA
02 NA NA 68 NA NA 55
03 24 NA NA 16 NA NA
03 NA 34 NA NA 22 NA
03 NA NA 41 NA NA 20

I want this:

id gp_2011 gp_2012 gp_2013 ge_2011 ge_2012 ge_2013
01 25 32 NA 12 22 NA
02 95 73 68 62 41 55
03 24 34 41 16 22 20

Any help or handy hints greatly appreciated!

Thanks!

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Gilrob
  • 93
  • 7
  • 2
    Can you share the sample input for this output? `dput()` would be the nicest way to share it as it is easily copy/pasteable, `dput(your_sample_data[1:9, ])` for the first 9 rows. – Gregor Thomas May 20 '22 at 05:21
  • What do you have? – Onyambu May 20 '22 at 05:22
  • @GregorThomas I am sorry to say I cannot! The data I am working with is sensitive :( – Gilrob May 20 '22 at 05:45
  • Welcome to SO! Please, [edit] your question and provide a [mcve] in order to reproduce your question and to verify the answers are correct. No one asks to disclose your original data but if you expect answers we need something to work with. Thank you. – Uwe May 20 '22 at 06:38
  • Surely you can share whatever sample input corresponds to the sample output you have already shared?? It's difficult to help when you only provide output, not input. Looks like Stefan went to the trouble to construct it for you, but in the future please provide sample input to make it easy for people to help you. – Gregor Thomas May 20 '22 at 17:42
  • @GregorThomas apologies I didn't really understand your point, (pretty new to this) will keep it in mind for future. – Gilrob May 23 '22 at 00:32
  • @Uwe thanks for the help and the link! Duly noted, will get it right next time :) thanks! – Gilrob May 23 '22 at 00:32
  • @Gilrob my point is that you already shared some data. You shared only the **output** that you are seeking. Presumably the information you shared in the output is not "sensitive". I wasn't asking you to share your entire sensitive data, just whatever small sample of **input** corresponds to the output that you already shared. When you don't share input, you are making it hard for people to help you: anyone who wants to help has an extra step of reconstructing your input--as Stefan did in the answer. So next time, please share input. – Gregor Thomas May 23 '22 at 13:50
  • 1
    @GregorThomas understood, thanks for taking the time to explain it to me, will do in future! – Gilrob May 24 '22 at 01:52

1 Answers1

1

There is neither need to duplicate your t column nor to use two pivot_wider. Instead you could achieve your desired result like so:

library(dplyr)
library(tidyr)

df %>%
  select(id, t, gp, ge) %>%
  pivot_wider(names_from = "t", values_from = c(gp, ge))
#> # A tibble: 3 × 7
#>   id    gp_2011 gp_2012 gp_2013 ge_2011 ge_2012 ge_2013
#>   <chr>   <int>   <int>   <int>   <int>   <int>   <int>
#> 1 01         25      32      NA      12      22      NA
#> 2 02         95      73      68      62      41      55
#> 3 03         24      34      41      16      22      20

DATA

df_wide <- data.frame(
                id = c("01", "02", "03"),
           gp_2011 = c(25L, 95L, 24L),
           gp_2012 = c(32L, 73L, 34L),
           gp_2013 = c(NA, 68L, 41L),
           ge_2011 = c(12L, 62L, 16L),
           ge_2012 = c(22L, 41L, 22L),
           ge_2013 = c(NA, 55L, 20L)
           )

library(tidyr)

df <- df_wide %>%
  pivot_longer(-id, names_to = c(".value", "t"), names_sep = "_")
stefan
  • 90,330
  • 6
  • 25
  • 51