How do I specify pivot_wider for an entire dataframe?

Question

I am able to pivot_wider for a specific column using the following:

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

I would like to pivot_wider with every column in a dataframe (minus an id column). What is the best way to do this? Should I use a loop or is there a way that this function takes the whole dataframe?

To clarify, using the below sample dataframes, I am able to go from old_df to new_df using the pivot_wider function I listed above. I would like to now go from old_df2 to new_df2.

old_df <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), col11 = c("dog", 
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

new_df2 <- structure(list(id = c("1", "2"), yellow = c("1", "NULL"), green = c("1", "1"), dog = c("1", "1"), cat = c("1", "NULL")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))

Dear Ava, it would be much easier to help you if you could just share a reproducible sample of your data with `dput(head(data))`. — Anoushiravan R, Aug 25 '21 at 15:05
Thanks @AnoushiravanR, I have just updated the post with a reproducible sample of my data. — piper180, Aug 25 '21 at 17:32
Maybe it would've been better to provide more thorough data set, since this sample data has only 3 observations and may not be a good representative of your original data set or any possible scenario that may come up. — Anoushiravan R, Aug 25 '21 at 17:38
I added a new column (col11). I would like to be able to use pivot_widen on the data in both col10 and col11 (and all other columns in my larger dataframe). Hope that's a bit more helpful! — piper180, Aug 25 '21 at 17:42
Would you please tell me how you would like your new column names to be? For example `yellow_dog`? — Anoushiravan R, Aug 25 '21 at 17:46
I'd like separate columns for each in the new_df. So in this case it would be id, dog, cat, yellow, and green each as their own column. — piper180, Aug 25 '21 at 17:55
Ok check my updates to see if that's what you are looking for. — Anoushiravan R, Aug 25 '21 at 17:59
Thanks @AnoushiravanR, I've updated my original post with some sample dataframes formatted to show what I'm looking for. I'm looking to go from old_df2 to new_df2 (I'm able to go from old_df to new_df). — piper180, Aug 25 '21 at 18:08
Ava, Thomas' solution is in `data.table` try it with your original data set and it may be memory efficient and definitionally faster. — Anoushiravan R, Aug 25 '21 at 21:15

Anoushiravan R · Answer 1 · 2021-08-25T18:17:07.630

4

If you would like to have separate column names for each value between these two columns (or any number of columns) you first need to use pivot_longer to put all the column names into a single column and then use pivot_wider to spread them:

library(tidyr)

old_df2 %>%
  pivot_longer(!c(id, value_col), names_to = "Cols", values_to = "vals") %>%
  pivot_wider(names_from = vals, values_from = value_col) %>%
  select(-Cols) %>%
  group_by(id) %>%
  summarise(across(everything(), ~ sum(as.numeric(.x), na.rm = TRUE)))

# A tibble: 2 x 5
  id    yellow   dog green   cat
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 1          1     1     1     1
2 2          0     1     1     0

edited Aug 25 '21 at 18:17

answered Aug 25 '21 at 17:36

Anoushiravan R

21,622
3
18
41

Thank you, but this is accomplishing what I have been able to do with the script I included in my post: new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list). I'm looking to be able to do this for all columns in old_df. – piper180 Aug 25 '21 at 17:43
1

Sure I will make an edit now. Would you like each color name to be a combination of color and animals in your new data set? – Anoushiravan R Aug 25 '21 at 17:43
Thank you dear Thomas, no it's just simple. But I really need to learn using `reshape` and I can learn from you. After discussing with the op I guess she would like to have values in those 2 columns as separate column names. But I'm still waiting for her verification. – Anoushiravan R Aug 25 '21 at 18:04
Thanks @AnoushiravanR, the pivot_longer and then pivot_wider method that you wrote for me is almost what I need! I see multiple rows with the same id, and need just one row per id. Is there another step I need to take afterwords or something I need to do to avoid this? Really appreciate all your help on this! – piper180 Aug 25 '21 at 18:20
1

No I made an edit, please check again. After that `summarise` function at the end we will only have 1 row per id. – Anoushiravan R Aug 25 '21 at 18:22
1

Yes, that's it! Thank you very much. I see now how you are using the pivot_longer and pivot_wider together. Thank you, I appreciate your help! – piper180 Aug 25 '21 at 18:26
1

My pleasure, glad it helped. If you apply `pivot_wider` right away your column names will be like a combination of all values separated by underscore or anything you specify explicitly but you need to pivot your data in long format first. – Anoushiravan R Aug 25 '21 at 18:28
1

Yes exactly, realizing now that I had run into exactly that issue earlier. – piper180 Aug 25 '21 at 18:29
Any suggestions on this error as a result of this method on the actual df? "Error: Internal error in `compact_rep()`: Negative `n` in `compact_rep()`. Run `rlang::last_error()` to see where the error occurred." I'm thinking the resulting df is too large for RStudio? Is that a possibility? – piper180 Aug 25 '21 at 18:52
I'm not sure. Could you please try the code on your data set with half of the observations for example. – Anoushiravan R Aug 25 '21 at 20:43

ThomasIsCoding · Accepted Answer · 2021-08-25T18:14:18.667

3

Update 1

As per your update, here comes with a data.table option

dcast(
  melt(setDT(old_df),
    id.var = "id",
    measure.vars = patterns("^col\\d+")
  ),
  id ~ value,
  fun.aggregate = length,
  fill = NA
)

which gives

   id cat dog green yellow
1:  1   1   1     1      1
2:  2  NA   1     1     NA

Are you looking for something like below?

reshape(
  transform(
    old_df,
    q = ave(id, id, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "id",
  timevar = "q"
)

The output is

  id col10.1 col11.1 value_col.1 col10.2 col11.2 value_col.2
1  1  yellow     dog           1   green     cat           1
3  2   green     dog           1    <NA>    <NA>        <NA>

edited Aug 25 '21 at 18:14

answered Aug 25 '21 at 18:00

ThomasIsCoding

96,636
9
24
81

Thanks @ThomasIsCoding, I've updated my original post with some example dataframes on what exactly I'm looking for. – piper180 Aug 25 '21 at 18:08
1

@ava Thanks for your update. You can check my update as well. – ThomasIsCoding Aug 25 '21 at 18:14
Thanks @ThomasIsCoding, I'm looking to get the values as column names in the new dataframe. – piper180 Aug 25 '21 at 18:23
your solutions seems to somewhat be close to working for my actual dataset! A couple questions for you: 1) is measure.vars indicating all the columns I would like to include or omit? I noticed you used the regular expression notation ^col\\d+ -- just curious if it's the same as 'col\\d+? 2) what is length referring to in the fun.aggregate = length? For some reason I'm getting values of 3 instead of 1 in my final output using my actual data, so want to make sure I'm accounting for the right set of columns. – piper180 Aug 26 '21 at 19:38
1

@ava 1) `measure.vars` indicates all columns that you ant to melt with, and `^` in `^col\\d` means the names start with `col`. 2) `length` refers to the number of elements within an aggregation group. If you just want to check the existence, you should try `function(x) +(length(x)>1)`. – ThomasIsCoding Aug 26 '21 at 21:19

score 2 · Answer 3 · answered Aug 25 '21 at 18:26

2

You could combine those columns and unnest them followed by pivot_wider:

library(tidyr)
library(dplyr)

old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
                                                           "green", "green"), col11 = c("dog", 
                                                                                        "cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))


old_df2 %>% 
  mutate(new_col = strsplit(paste(col10, col11, sep = "_"), "_"), .keep = "unused") %>% 
  unnest(new_col) %>% 
  pivot_wider(names_from = new_col, values_from = value_col)

#> # A tibble: 2 x 5
#>   id    yellow dog   green cat  
#>   <chr> <chr>  <chr> <chr> <chr>
#> 1 1     1      1     1     1    
#> 2 2     <NA>   1     1     <NA>

^{Created on 2021-08-25 by the reprex package (v2.0.1)}

answered Aug 25 '21 at 18:26

Martin Gal

16,640
5
21
39

Nice brother! :D – Anoushiravan R Aug 25 '21 at 18:28
I actually didn't understand your approach, so I tried it myself and took another way. ;-) – Martin Gal Aug 25 '21 at 18:31
1

No the op wanted to have values in those 2 columns as new column names so I thought I would put them first in one column via `pivot_longer` and then spread them. – Anoushiravan R Aug 25 '21 at 20:45
Do you have any solution for this: https://stackoverflow.com/questions/68909574/combine-two-dfs-with-different-column-names-and-then-melt I mean using `tidyr` only. – Anoushiravan R Aug 25 '21 at 20:45
1

`tidyr` only as in "not using `dplyr`"? – Martin Gal Aug 25 '21 at 20:49
I actually meant pivoting because I could do it with a little bit of regex but in the end I got some extra observations, thought you might have an idea here is my code `df1 %>% bind_cols(df2) %>% pivot_longer(!c(weight, area), names_to = c(".value"), names_pattern = "([[:alpha:]]+)\\d?") %>% pivot_longer(!c(sd, treatment), names_to = "var", values_to = "sum")`. – Anoushiravan R Aug 25 '21 at 20:52
1

I would try something like `df2 %>% rename(sd = sd2, treatment = treatment2) %>% pivot_longer(area, names_to = "var", values_to = "sum") %>% bind_rows(pivot_longer(df1, weight, names_to = "var", values_to = "sum")) %>% select(treatment, var, sum, sd)`. But, well... it's not `tidyr` only. ;-) – Martin Gal Aug 25 '21 at 21:05
It's ok just post it then I will review it again :) – Anoushiravan R Aug 25 '21 at 21:05

How do I specify pivot_wider for an entire dataframe?

3 Answers3

Update 1