1

I've looked at several solutions on here but they don't work for me and I don't quite understand how to vary them.

I have a very wide data structure, which looks like this (but even wider):

   Key  Metadata Var1  Var1_remark Var2      Var2_remark Var3  Var3_remark Var4      Var4_remark
1 Key1 Metadata1  0.0         <NA> 0.50 Present in A,C,E 0.00         <NA> 0.50 Present in B,D,F
2 Key2 Metadata2  0.0         <NA> 0.33     Present in A 0.33 Present in B 0.33     Present in C
3 Key3 Metadata3  0.5 Present in A 0.50     Present in B 0.00         <NA> 0.00             <NA>

Code to generate example:

Key <- c("Key1", "Key2","Key3")
Metadata <- c("Metadata1","Metadata2", "Metadata3")
Var1 <- c(0, 0, 0.5)
Var1_remark <- c(NA,NA,"Present in A")
Var2 <- c(0.5, 0.33, 0.5)
Var2_remark <- c("Present in A,C,E", "Present in A", "Present in B")
Var3 <- c(0, 0.33, 0)
Var3_remark <- c(NA,"Present in B",NA)
Var4 <- c(0.5,0.33,0)
Var4_remark <- c("Present in B,D,F", "Present in C", NA)
wide.data <- data.frame(Key, Metadata, Var1, Var1_remark, Var2, 
       Var2_remark, Var3, Var3_remark, Var4, Var4_remark)

I want to turn it into the following long format, where column names are split into Parameter (which hosts the variable name), Value (the value of the variable) and Remark (which hosts the *_remark column):

   Key  Metadata Parameter  Value             Remark
1 Key1 Metadata1      Var1   0.00               <NA>
2 Key1 Metadata1      Var2   0.50   Present in A,C,E
3 Key1 Metadata1      Var3   0.00               <NA>
4 Key1 Metadata1      Var4   0.50   Present in B,D,F
1 Key2 Metadata2      Var1   0.00               <NA>
2 Key2 Metadata2      Var2   0.33       Present in A
3 Key2 Metadata2      Var3   0.33       Present in B
4 Key2 Metadata2      Var4   0.33       Present in C
1 Key3 Metadata3      Var1   0.50       Present in A
2 Key3 Metadata3      Var2   0.50       Present in B
3 Key3 Metadata3      Var3   0.00               <NA>
4 Key3 Metadata3      Var4   0.00               <NA>

I have tried both melt and pivot_longer (many times), but haven't been able to generate my desired output. The closest I have come is with pivot_longer but either the "Value" column I want is missing:

Test 1 (no "Value" column):

pivot_longer(wide.data, cols=-c('Key','Metadata'), 
  names_to = c('Parameter','.value'), names_sep = '_')

or there is not a single Value column (Var1, Var2, Var3, Var4 kept as independent columns):

Test 2:

pivot_longer(wide.data, cols=ends_with('_remark'), 
    names_to = c('Parameter','.value'), names_sep = '_')

How do I get to my desired output given wide.data?

akrun
  • 874,273
  • 37
  • 540
  • 662

2 Answers2

4

Update: Improved version: values_transform = list(val = as.character)) and removed other lines:

df %>%
  pivot_longer(cols = contains("remark"),
               names_to = c("Parameter", ".value"), 
               names_sep = "_",
               values_transform = list(val = as.character)) %>%  
  select(Key, Metadata, Parameter, Remark = remark) %>% 
  bind_cols(df %>% 
              pivot_longer(c(Var1, Var2, Var3, Var4),
                           values_transform = list(val = as.character)) %>% 
              select(Value = value)
            )
   Key   Metadata  Parameter Remark           Value
   <chr> <chr>     <chr>     <chr>            <dbl>
 1 Key1  Metadata1 Var1      NA                0   
 2 Key1  Metadata1 Var2      Present in A,C,E  0.5 
 3 Key1  Metadata1 Var3      NA                0   
 4 Key1  Metadata1 Var4      Present in B,D,F  0.5 
 5 Key2  Metadata2 Var1      NA                0   
 6 Key2  Metadata2 Var2      Present in A      0.33
 7 Key2  Metadata2 Var3      Present in B      0.33
 8 Key2  Metadata2 Var4      Present in C      0.33
 9 Key3  Metadata3 Var1      Present in A      0.5 
10 Key3  Metadata3 Var2      Present in B      0.5 
11 Key3  Metadata3 Var3      NA                0   
12 Key3  Metadata3 Var4      NA                0 

First answer:

Finally it was not as easy as I first thought, here is a working version. I am not sure if it is the best!

library(dplyr)
library(tidyr)

df %>%
  mutate(across(everything(), as.character)) %>% 
  pivot_longer(cols = contains("remark"),
               names_to = c("Parameter", ".value"), 
               names_sep = "_") %>%  
  group_by(Key,Parameter) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(Key, Metadata, Parameter, Remark = remark) %>% 
  bind_cols(df %>%
              mutate(across(everything(), as.character)) %>% 
              pivot_longer(c(Var1, Var2, Var3, Var4)) %>% 
              select(Value = value)
            )
   Key   Metadata  Parameter Remark           Value
   <chr> <chr>     <chr>     <chr>            <chr>
 1 Key1  Metadata1 Var1      NA               0    
 2 Key1  Metadata1 Var2      Present in A,C,E 0.5  
 3 Key1  Metadata1 Var3      NA               0    
 4 Key1  Metadata1 Var4      Present in B,D,F 0.5  
 5 Key2  Metadata2 Var1      NA               0    
 6 Key2  Metadata2 Var2      Present in A     0.33 
 7 Key2  Metadata2 Var3      Present in B     0.33 
 8 Key2  Metadata2 Var4      Present in C     0.33 
 9 Key3  Metadata3 Var1      Present in A     0.5  
10 Key3  Metadata3 Var2      Present in B     0.5  
11 Key3  Metadata3 Var3      NA               0    
12 Key3  Metadata3 Var4      NA               0  
TarJae
  • 72,363
  • 6
  • 19
  • 66
2

Things are a bit easier if you rename your Var1 columns to Var1_value. Here we use rename to take care of this for us and then then pivot_longer is more direct. Try

library(dplyr)
library(tidyr)
wide.data %>% 
  rename_with(.cols=matches("Var\\d+$"), ~paste0(.x, "_value")) %>% 
  pivot_longer(-c(Key, Metadata), names_sep="_", names_to=c("parameter", ".value"))

and the output is

   Key   Metadata  parameter value remark          
   <chr> <chr>     <chr>     <dbl> <chr>           
 1 Key1  Metadata1 Var1       0    NA              
 2 Key1  Metadata1 Var2       0.5  Present in A,C,E
 3 Key1  Metadata1 Var3       0    NA              
 4 Key1  Metadata1 Var4       0.5  Present in B,D,F
 5 Key2  Metadata2 Var1       0    NA              
 6 Key2  Metadata2 Var2       0.33 Present in A    
 7 Key2  Metadata2 Var3       0.33 Present in B    
 8 Key2  Metadata2 Var4       0.33 Present in C    
 9 Key3  Metadata3 Var1       0.5  Present in A    
10 Key3  Metadata3 Var2       0.5  Present in B    
11 Key3  Metadata3 Var3       0    NA              
12 Key3  Metadata3 Var4       0    NA   

as desired.

MrFlick
  • 195,160
  • 17
  • 277
  • 295