2

I am trying to wrangle/reorganize some multi-level grouped data from a .TXT source. Here is a generic version of my problem/question:
6x5 spreadsheet snip: three category indicators
(1 class, 3 groups (A,B,C), 2 levels (1,2)), two variables (var1, var2), twelve simulated datapoints)

I am trying to reorganize it into: one record with twelve variables, as simplified below:
1x13 spreadsheet snip: one class, twelve variables (vara11,vara12,vara21,vara22,varb11,varb12,varb21,varb22,varc11,varc12,varc21,varc22)

The following prior exchanges seem to be relevant, but I am not (as yet) an adept-enough R-user to make good sense of them. If I grasped how to "create" the scenario within R, I would understand how to "undo" it. I am stuck.

Multi-level grouped data.frame How to summarise and subset multi-level grouped dataframe in dplyr and R

RowGroup Is there a way to make Multi - Level row grouping table in R DT pacakge

Help would be greatly appreciated.

margusl
  • 7,804
  • 2
  • 16
  • 20
V-cubed
  • 23
  • 4

2 Answers2

1

pivot_wider() with .value and names_glue:

library(stringr)
library(readr)
library(dplyr)
library(tidyr)

data.frame(Class = rep(1,6),
           Group = rep(c("A","B", "C"), each = 2),
           Level = rep(1:2,3),
           Var1 = c(790, 522, 527, 661, 846, 114),
           Var2 = c(16, 34, 33, 14, 38, 46)) %>% 
  
  pivot_wider(names_from = c(Group, Level), 
                   values_from = c(Var1, Var2), 
                   names_glue = "Var{Group}{Level}{parse_number(.value)}"
                   ) %>% 
  
  select(sort(names(.)))
#> # A tibble: 1 × 13
#>   Class VarA11 VarA12 VarA21 VarA22 VarB11 VarB12 VarB21 VarB22 VarC11 VarC12 VarC21 VarC22
#>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1    790     16    522     34    527     33    661     14    846     38    114     46

Created on 2023-03-17 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20
0

This is a fairly straightforward pivot_wider operation with the twist of needing to pivot_longer first to get the dataframe into proper "tidy" format.

init_df <- data.frame(Class=1, 
           Group=gl(3, 2, labels = c("A", "B", "C")),
           Level=1:2,
           Var1=round(runif(6)*1000),
           Var2=round(runif(6)*1000))

library(tidyr)
init_df %>%
  pivot_longer(c(Var1, Var2), names_to = "Var", names_prefix = "Var") %>%
  pivot_wider(names_from = Group:Var, values_from = value, names_sep = "", 
              names_prefix = "Var")
# A tibble: 1 × 13
  Class VarA11 VarA12 VarA21 VarA22 VarB11 VarB12 VarB21 VarB22 VarC11 VarC12 VarC21 VarC22
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1    429    630    618     85    938    395     93    658    708    859     58    106

This should work for any number of Class and Group entries.

Dubukay
  • 1,764
  • 1
  • 8
  • 13
  • To: Dubukay & margusl Thank you, I appreciate your help. My actual data is far less "well-behaved" than the generic representation. Not every Group has all levels--and I need to assign 0 values to those (not NA or NULL) because observation of 0 is actually important. From other reading/posts/comments I've seen, I understand that "looping" is not well-regarded. But I am left with the impression I need to loop-through the rows and pick-off the relevant sections and conditionally paste them into data.frame (or tibble) to get the final arrangement needed for later analysis. Trying to learn. – V-cubed Mar 20 '23 at 23:10
  • `pivot_wider` will replace all the missing values with NAs, so all that you should need to do afterward is `dataset[is.na(dataset)] <- 0` to replace the NAs with zeroes. – Dubukay Mar 20 '23 at 23:19