0

I apologize if this is a duplicate but I can't seem to make any solutions I've found work for my problem. My problem seems simple, but I just can't seem to make anything give me the output I need.

I have a dataframe with all of the variables stacked like so:

Variable Old_Value New Value
Var1 A1 A2
Var1 A3 A4
Var1 A5 A6
Var2 B1 B2
Var2 B3 B4
Var2 B5 B6

And I just want each Variable in its own column, like so:

Var1_Old Var1_New Var2_Old Var2_New
A1 A2 B1 B2
A3 A4 B3 B4
A5 A6 B5 B6

I've played around with spread, dcast, pivot_wider, but am running into issues because I don't have a key value. I'm sure I'm missing something simple....

Thank you for your help!

Garen Pledge
  • 113
  • 1
  • 6
  • Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – deschen Mar 23 '21 at 18:14
  • Is there any reason why the first result row is `A1, A2, B1, B2` instead of `A1, A2, B3, B4`? Your transformation seems to have some sort of implied link between (in the original data) the first row of `Var1` and the first row of `Var2` - which normally would be coded as a key value. My question here is trying to understand whether that link is deliberate or if it doesn't really matter. Your example also seems to assume that there are an equal number of `Var1` rows and `Var2` rows - is this the case? The simplest solution is probably to just create a key value column... – Gregor Thomas Mar 23 '21 at 18:16
  • Gregor - To answer your questions, 1. There is no implied link between the first row of Var 1 and the first row of Var2. I am simply trying to unstack variable values. And 2. The number of rows per var could change, i.e., there could be multiple values for each var that go from old to new. Thank you for your help! – Garen Pledge Mar 23 '21 at 18:23
  • deschen - Thank you for the suggestion! I looked at that example and couldn't make it work because there is no key value that ties everything together. But like I mentioned in my post, I feel like I'm missing something simple. – Garen Pledge Mar 23 '21 at 18:24

1 Answers1

0

This is how I would have done it:

library(tibble)
library(tidyr)
tibble(
  variable = paste0('Var', rep(1:2, each = 3)),
  old_value = c(paste0(rep(c('A', 'B'), each = 3), c(1,3,5))),
  new_value = c(paste0(rep(c('A', 'B'), each = 3), c(2,4,6))),
) %>% 
  gather('k', 'v', -1) %>% 
  unite('tmp', variable, k) %>%
  mutate(
    tmp2 = rep(1:3, 4)
  ) %>% 
  spread(tmp, v) %>% 
  select(-tmp2)

Output:

# A tibble: 3 x 4
  Var1_new_value Var1_old_value Var2_new_value Var2_old_value
  <chr>          <chr>          <chr>          <chr>         
1 A2             A1             B2             B1            
2 A4             A3             B4             B3            
3 A6             A5             B6             B5
Baraliuh
  • 593
  • 3
  • 12
  • Thank you so much! In your solution Tmp2 becomes the key value to use spread. Makes sense. Can I ask a quick follow up? Would there be an easy way to modify tmp2 if Var1 and Var2 had a different number of values? Basically instead of tmp2 taking on values between 1 and 3, tmp2 would take on values of 1 to x where x is the max number of rows per Var? – Garen Pledge Mar 23 '21 at 18:51
  • The spread variable is tmp, tmp2 is used to make rows unique. If var1 or var2 have different lengths, the code needs to be somewhat modified so that tmp2 gives a unique integer for each var1 and var2. This will introduce NA values in the final output. – Baraliuh Mar 23 '21 at 18:56
  • Got it, thanks for the clarification and help! – Garen Pledge Mar 23 '21 at 19:01
  • FYI, `spread` and `gather` have been superseded by `pivot_wider` and `pivot_longer`. – LMc Mar 23 '21 at 19:33
  • Yeah, an old habit I guess. Though, I usually find `spread` to be faster than `pivot_longer` maybe just because I am not as used to it. – Baraliuh Mar 24 '21 at 17:21