Pivot_wider function (tidyr r package) from multiple variables

Question

I would like to put the dataframe in the wide format considering two variables as criteria (maybe even unnecessary). But I comment on this because the original df is 480 rows and several sub-levels.

This is returning an error!

library(tidyr)
library(dplyr)
                                                                
df <- structure(list(ID = c(1, 2, 3, 4), Gender = c("Men", "Women", "Men", 
"Women"), Country = c("Austria", "Austria", "Austria", "Austria"
), Season_ID = c("2011", "2012", "2011", "2012"), Region_UN = c("A", 
"B", "A", "B")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

df_wide <- df %>%
  pivot_wider(names_from = Gender,
              values_from = Region_UN,
              id_cols = c(Country, Season_ID))

Warning message: Values are not uniquely identified; output will contain list-cols.

Use values_fn = list to suppress this warning.
Use values_fn = length to identify where the duplicates arise
Use values_fn = {summary_fun} to summarise duplicates

I don't know which argument I could put in values_fn!

tpetzoldt · Answer 1 · 2021-05-13T19:39:36.530

6

You can also paste it together:

df_wide <- df %>%
  pivot_wider(names_from = Gender,
              values_from = Region_UN,
              id_cols = c(Country, Season_ID),
              values_fn = function(x) paste(x, collapse=","))

df_wide

and as both are the same also:

df_wide <- df %>%
  pivot_wider(names_from = Gender,
              values_from = Region_UN,
              id_cols = c(Country, Season_ID),
              values_fn = first)
df_wide

edited May 13 '21 at 19:39

answered May 13 '21 at 19:34

tpetzoldt

5,338
2
12
29

Thanks. The second option is the best. – Cristiano May 13 '21 at 20:37

score 3 · Accepted Answer · answered May 13 '21 at 19:25

3

We can create a sequence column

library(dplyr)
library(tidyr)
library(data.table)
df %>% 
  mutate(ID = NULL, rn = rowid(Country, Season_ID)) %>%     
  pivot_wider(names_from = Gender,
          values_from = Region_UN,
          id_cols = c(rn, Country, Season_ID))

answered May 13 '21 at 19:25

akrun

874,273
37
540
662

1

Thank you very much. This worked perfectly – Cristiano May 13 '21 at 20:36
What is the advantage of the sequential column? – Cristiano May 13 '21 at 20:41
1

If there are duplicates for id_cols, it wouldn't know which row the duplicate should be handled. sequential column make it unique to give the location for each element – akrun May 13 '21 at 20:47

Pivot_wider function (tidyr r package) from multiple variables

2 Answers2

Linked

Related