0

I have dataframe with 6 columns and I want to partition them into 3 parts each with 2 columns. I want to recombine partitions in all possible combinations to create 7 new dataframes

part1,part2,part3
part1,par2
part1,part3
part2,part3
part1
part2
part3

I modified this solution a bit to recombine them Split a dataframe into all possible combinations of dataframes by 3 columns in R

>frame <- data.frame(id = letters[seq( from = 1, to = 10 )], a = rnorm(10, 4), b = rnorm(10, 6), c=rnorm(10, 5), d = rnorm(10, 2),e=rnorm(10, 5), f = rnorm(10, 2))

> frame
   id        a        b        c         d        e          f
1   a 6.322845 5.828619 5.465636 2.7658092 6.522706  1.4896078
2   b 2.352437 5.521230 6.555715 0.6612871 5.288508  2.4837969
3   c 2.790967 9.253197 3.724231 2.9954273 4.887744  1.3020424
4   d 2.017975 6.038846 4.540511 1.7989492 6.059974 -0.2463154
5   e 4.004463 4.384898 5.341084 1.9528288 4.186449  1.0823939
6   f 2.600336 6.562758 5.708489 2.1142707 6.769220  1.7942291
7   g 3.850400 7.231973 4.918542 3.3562489 6.090841  1.4202527
8   h 2.932744 6.377516 5.518261 1.7423230 4.422915  1.8789437
9   i 5.135185 5.218992 4.710196 1.1878825 5.421876  0.8455756
10  j 5.188278 7.233590 6.303500 0.3868047 4.390973  1.6997801 

>m <- seq(3) 
>j <-function(m){lapply(as.data.frame(combn(ncol(frame) - 1, m)), function(idx) frame[, c(1, idx + 1)])}

>lapply(m, function(m) j(m))

This would create all combinations by shuffling all columns. I do not want combinations of columns, but combinations of partitions. How can I achieve that?

user3311147
  • 281
  • 2
  • 5
  • 16

2 Answers2

1

Here is one try:

library(dplyr)
library(purrr)

# Assign a partition to be used here
# (Updated from OP's clarification about pttns & @bouncyball's comment)
pttn <- split(names(frame)[-1], rep(1:3, each = 2))

# Create combinations of partitioned columns
do.call(c, lapply(seq_along(pttn), combn, x = pttn, simplify = FALSE)) %>% 
   map(~ frame %>% select(reduce(.x, c)))

The first line with do.call creates all combinations of 'partitions', or the partitioned column names. If you want to preserve ID column, you can use id, reduce(.x, c) instead of reduce(.x, c)

Hong
  • 574
  • 3
  • 10
  • 1
    nice! I would just change `pttn` to be `pttn <- split(names(frame)[-1], rep(1:3, each = 2))` – bouncyball Mar 29 '21 at 18:07
  • 1
    Thanks! The original answer was written before OP clarified what partitions should be used :) I've updated the answer accordingly! – Hong Mar 29 '21 at 18:13
  • I know we are not supposed to use the comments for this, but this answer is amazing... a bit magical even. – Marcelo Avila Mar 29 '21 at 18:20
  • Perfect soution. Thank you! – user3311147 Mar 29 '21 at 19:22
  • How can I generate each combination on the fly during the data process rather than throwing all combinations as output at a time? This will be useful for large datasets. Thanks – user3311147 Mar 30 '21 at 00:28
  • Can you clarify? Are you looking for something like `frame %>% select(pttn[[1]])` ? – Hong Mar 30 '21 at 01:22
  • I want to process each one of the 7 outputs further, one at a time. I do not want all 7 outputs generated at the same time. It will be helpful to reduce the memory usage if I can generate each of the 7 outputs, one at a time, and process it. I hope I made it clear. Thank you once again – user3311147 Mar 30 '21 at 01:47
  • frame %>% select(pttn[[1]]) works partially, but it is priting same solution 7 times. – user3311147 Mar 30 '21 at 01:48
0

A possible solution using purrr::map() and some data wrangling to long/wide. Might not be the most efficient or elagant solution, but it does its job.

library(tidyverse)

# sample data
frame <- data.frame(
  id = letters[seq( from = 1, to = 10 )], a = rnorm(10, 4), b = rnorm(10, 6), c=rnorm(10, 5), 
  d = rnorm(10, 2),e=rnorm(10, 5), f = rnorm(10, 2))

# list of possible combinations 
list_of_combinations <- list(
  c(1), 
  c(2),
  c(3),
  c(1,2),
  c(1,3),
  c(2,3),
  c(1,2,3)
)

# data in long format and a category variable (for each "chunk")
df_long <- frame %>% pivot_longer(-id) %>% 
  mutate(
    cat = case_when(
      (name %in% c("a", "b")) ~ 1L, 
      (name %in% c("c", "d")) ~ 2L, 
      (name %in% c("e", "f")) ~ 3L)
  )
df_long
#> # A tibble: 60 x 4
#>    id    name   value   cat
#>    <chr> <chr>  <dbl> <int>
#>  1 a     a      3.93      1
#>  2 a     b      4.66      1
#>  3 a     c      2.78      2
#>  4 a     d      2.35      2
#>  5 a     e      5.93      3
#>  6 a     f     -0.500     3
#>  7 b     a      5.11      1
#>  8 b     b      5.37      1
#>  9 b     c      4.61      2
#> 10 b     d      3.58      2
#> # … with 50 more rows

# map list to generate a list of each combination and then map it back into wide format 
final_list_of_dfs <- list_of_combinations %>% map( ~ df_long %>% filter(cat %in% .x)) %>% 
  map(~ .x %>% select(-cat) %>% pivot_wider(names_from = "name", values_from = "value"))
glimpse(final_list_of_dfs)
#> List of 7
#>  $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ id: chr [1:10] "a" "b" "c" "d" ...
#>   ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#>   ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#>  $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ id: chr [1:10] "a" "b" "c" "d" ...
#>   ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#>   ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#>  $ : tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ id: chr [1:10] "a" "b" "c" "d" ...
#>   ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#>   ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#>  $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   ..$ id: chr [1:10] "a" "b" "c" "d" ...
#>   ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#>   ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#>   ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#>   ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#>  $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   ..$ id: chr [1:10] "a" "b" "c" "d" ...
#>   ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#>   ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#>   ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#>   ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#>  $ : tibble [10 × 5] (S3: tbl_df/tbl/data.frame)
#>   ..$ id: chr [1:10] "a" "b" "c" "d" ...
#>   ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#>   ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#>   ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#>   ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...
#>  $ : tibble [10 × 7] (S3: tbl_df/tbl/data.frame)
#>   ..$ id: chr [1:10] "a" "b" "c" "d" ...
#>   ..$ a : num [1:10] 3.93 5.11 4.16 2.59 2.69 ...
#>   ..$ b : num [1:10] 4.66 5.37 3.26 5.52 6.29 ...
#>   ..$ c : num [1:10] 2.78 4.61 3.8 3.06 4.68 ...
#>   ..$ d : num [1:10] 2.353 3.579 0.744 1.582 3.377 ...
#>   ..$ e : num [1:10] 5.93 3.89 5.43 3.88 5.51 ...
#>   ..$ f : num [1:10] -0.5 0.941 3.703 2.035 0.611 ...

Created on 2021-03-29 by the reprex package (v1.0.0)

Marcelo Avila
  • 2,314
  • 1
  • 14
  • 22