2

I am able to pivot_wider for a specific column using the following:

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

I would like to pivot_wider with every column in a dataframe (minus an id column). What is the best way to do this? Should I use a loop or is there a way that this function takes the whole dataframe?

To clarify, using the below sample dataframes, I am able to go from old_df to new_df using the pivot_wider function I listed above. I would like to now go from old_df2 to new_df2.

old_df <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
"green", "green"), col11 = c("dog", 
"cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list)

new_df2 <- structure(list(id = c("1", "2"), yellow = c("1", "NULL"), green = c("1", "1"), dog = c("1", "1"), cat = c("1", "NULL")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
piper180
  • 329
  • 2
  • 12
  • Dear Ava, it would be much easier to help you if you could just share a reproducible sample of your data with `dput(head(data))`. – Anoushiravan R Aug 25 '21 at 15:05
  • 1
    Thanks @AnoushiravanR, I have just updated the post with a reproducible sample of my data. – piper180 Aug 25 '21 at 17:32
  • Maybe it would've been better to provide more thorough data set, since this sample data has only 3 observations and may not be a good representative of your original data set or any possible scenario that may come up. – Anoushiravan R Aug 25 '21 at 17:38
  • I added a new column (col11). I would like to be able to use pivot_widen on the data in both col10 and col11 (and all other columns in my larger dataframe). Hope that's a bit more helpful! – piper180 Aug 25 '21 at 17:42
  • Would you please tell me how you would like your new column names to be? For example `yellow_dog`? – Anoushiravan R Aug 25 '21 at 17:46
  • I'd like separate columns for each in the new_df. So in this case it would be id, dog, cat, yellow, and green each as their own column. – piper180 Aug 25 '21 at 17:55
  • Ok check my updates to see if that's what you are looking for. – Anoushiravan R Aug 25 '21 at 17:59
  • Thanks @AnoushiravanR, I've updated my original post with some sample dataframes formatted to show what I'm looking for. I'm looking to go from old_df2 to new_df2 (I'm able to go from old_df to new_df). – piper180 Aug 25 '21 at 18:08
  • I made another updates. – Anoushiravan R Aug 25 '21 at 18:20
  • 1
    Ava, Thomas' solution is in `data.table` try it with your original data set and it may be memory efficient and definitionally faster. – Anoushiravan R Aug 25 '21 at 21:15

3 Answers3

4

If you would like to have separate column names for each value between these two columns (or any number of columns) you first need to use pivot_longer to put all the column names into a single column and then use pivot_wider to spread them:

library(tidyr)

old_df2 %>%
  pivot_longer(!c(id, value_col), names_to = "Cols", values_to = "vals") %>%
  pivot_wider(names_from = vals, values_from = value_col) %>%
  select(-Cols) %>%
  group_by(id) %>%
  summarise(across(everything(), ~ sum(as.numeric(.x), na.rm = TRUE)))

# A tibble: 2 x 5
  id    yellow   dog green   cat
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 1          1     1     1     1
2 2          0     1     1     0
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • Thank you, but this is accomplishing what I have been able to do with the script I included in my post: new_df <- pivot_wider(old_df, names_from = col10, values_from = value_col, values_fn = list). I'm looking to be able to do this for all columns in old_df. – piper180 Aug 25 '21 at 17:43
  • 1
    Sure I will make an edit now. Would you like each color name to be a combination of color and animals in your new data set? – Anoushiravan R Aug 25 '21 at 17:43
  • Thank you dear Thomas, no it's just simple. But I really need to learn using `reshape` and I can learn from you. After discussing with the op I guess she would like to have values in those 2 columns as separate column names. But I'm still waiting for her verification. – Anoushiravan R Aug 25 '21 at 18:04
  • Thanks @AnoushiravanR, the pivot_longer and then pivot_wider method that you wrote for me is almost what I need! I see multiple rows with the same id, and need just one row per id. Is there another step I need to take afterwords or something I need to do to avoid this? Really appreciate all your help on this! – piper180 Aug 25 '21 at 18:20
  • 1
    No I made an edit, please check again. After that `summarise` function at the end we will only have 1 row per id. – Anoushiravan R Aug 25 '21 at 18:22
  • 1
    Yes, that's it! Thank you very much. I see now how you are using the pivot_longer and pivot_wider together. Thank you, I appreciate your help! – piper180 Aug 25 '21 at 18:26
  • 1
    My pleasure, glad it helped. If you apply `pivot_wider` right away your column names will be like a combination of all values separated by underscore or anything you specify explicitly but you need to pivot your data in long format first. – Anoushiravan R Aug 25 '21 at 18:28
  • 1
    Yes exactly, realizing now that I had run into exactly that issue earlier. – piper180 Aug 25 '21 at 18:29
  • Any suggestions on this error as a result of this method on the actual df? "Error: Internal error in `compact_rep()`: Negative `n` in `compact_rep()`. Run `rlang::last_error()` to see where the error occurred." I'm thinking the resulting df is too large for RStudio? Is that a possibility? – piper180 Aug 25 '21 at 18:52
  • I'm not sure. Could you please try the code on your data set with half of the observations for example. – Anoushiravan R Aug 25 '21 at 20:43
3

Update 1

As per your update, here comes with a data.table option

dcast(
  melt(setDT(old_df),
    id.var = "id",
    measure.vars = patterns("^col\\d+")
  ),
  id ~ value,
  fun.aggregate = length,
  fill = NA
)

which gives

   id cat dog green yellow
1:  1   1   1     1      1
2:  2  NA   1     1     NA

Are you looking for something like below?

reshape(
  transform(
    old_df,
    q = ave(id, id, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "id",
  timevar = "q"
)

The output is

  id col10.1 col11.1 value_col.1 col10.2 col11.2 value_col.2
1  1  yellow     dog           1   green     cat           1
3  2   green     dog           1    <NA>    <NA>        <NA>
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Thanks @ThomasIsCoding, I've updated my original post with some example dataframes on what exactly I'm looking for. – piper180 Aug 25 '21 at 18:08
  • 1
    @ava Thanks for your update. You can check my update as well. – ThomasIsCoding Aug 25 '21 at 18:14
  • Thanks @ThomasIsCoding, I'm looking to get the values as column names in the new dataframe. – piper180 Aug 25 '21 at 18:23
  • your solutions seems to somewhat be close to working for my actual dataset! A couple questions for you: 1) is measure.vars indicating all the columns I would like to include or omit? I noticed you used the regular expression notation ^col\\d+ -- just curious if it's the same as 'col\\d+? 2) what is length referring to in the fun.aggregate = length? For some reason I'm getting values of 3 instead of 1 in my final output using my actual data, so want to make sure I'm accounting for the right set of columns. – piper180 Aug 26 '21 at 19:38
  • 1
    @ava 1) `measure.vars` indicates all columns that you ant to melt with, and `^` in `^col\\d` means the names start with `col`. 2) `length` refers to the number of elements within an aggregation group. If you just want to check the existence, you should try `function(x) +(length(x)>1)`. – ThomasIsCoding Aug 26 '21 at 21:19
2

You could combine those columns and unnest them followed by pivot_wider:

library(tidyr)
library(dplyr)

old_df2 <- structure(list(id = c("1", "1", "2"), col10 = c("yellow", 
                                                           "green", "green"), col11 = c("dog", 
                                                                                        "cat", "dog"), value_col = c("1", "1", "1")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))


old_df2 %>% 
  mutate(new_col = strsplit(paste(col10, col11, sep = "_"), "_"), .keep = "unused") %>% 
  unnest(new_col) %>% 
  pivot_wider(names_from = new_col, values_from = value_col)

#> # A tibble: 2 x 5
#>   id    yellow dog   green cat  
#>   <chr> <chr>  <chr> <chr> <chr>
#> 1 1     1      1     1     1    
#> 2 2     <NA>   1     1     <NA>

Created on 2021-08-25 by the reprex package (v2.0.1)

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Nice brother! :D – Anoushiravan R Aug 25 '21 at 18:28
  • I actually didn't understand your approach, so I tried it myself and took another way. ;-) – Martin Gal Aug 25 '21 at 18:31
  • 1
    No the op wanted to have values in those 2 columns as new column names so I thought I would put them first in one column via `pivot_longer` and then spread them. – Anoushiravan R Aug 25 '21 at 20:45
  • Do you have any solution for this: https://stackoverflow.com/questions/68909574/combine-two-dfs-with-different-column-names-and-then-melt I mean using `tidyr` only. – Anoushiravan R Aug 25 '21 at 20:45
  • 1
    `tidyr` only as in "not using `dplyr`"? – Martin Gal Aug 25 '21 at 20:49
  • I actually meant pivoting because I could do it with a little bit of regex but in the end I got some extra observations, thought you might have an idea here is my code `df1 %>% bind_cols(df2) %>% pivot_longer(!c(weight, area), names_to = c(".value"), names_pattern = "([[:alpha:]]+)\\d?") %>% pivot_longer(!c(sd, treatment), names_to = "var", values_to = "sum")`. – Anoushiravan R Aug 25 '21 at 20:52
  • 1
    I would try something like `df2 %>% rename(sd = sd2, treatment = treatment2) %>% pivot_longer(area, names_to = "var", values_to = "sum") %>% bind_rows(pivot_longer(df1, weight, names_to = "var", values_to = "sum")) %>% select(treatment, var, sum, sd)`. But, well... it's not `tidyr` only. ;-) – Martin Gal Aug 25 '21 at 21:05
  • It's ok just post it then I will review it again :) – Anoushiravan R Aug 25 '21 at 21:05