Possible combinations of pandas columns from a list

Question

I have a pandas dataframe in python, with the following structure:

Date	A	B	C	D	E	F	G	H
2022-01-01 10:00:00	1	3	0	4	1	0	0	1
2022-01-01 11:00:00	0	1	1	2	0	2	3	5
2022-01-01 12:00:00	0	2	0	1	0	1	0	0
2022-01-01 13:00:00	0	0	2	0	0	3	1	2
2022-01-02 10:00:00	0	3	0	1	2	4	4	1
2022-01-02 11:00:00	0	0	6	0	0	0	5	0
2022-01-02 12:00:00	0	0	6	0	2	4	5	3
2022-01-02 13:00:00	0	0	0	1	3	0	0	0

This is a snippet of my real dataframe, which has lots of "letter" columns (more than 100). My problem is that I need to obtain the sum of the numbers for each datetime and letter for different combinations.

This was answered in All possible combinations as new columns of a dataframe based on primary column, but it computes all possible combinations from N to M elements. For a df of 100 columns, all possible combinations from 1 sigle letter column to the full combined 100 is madness and impossible to compute.

Fortunately, I am only interested in certain combinations (again, ca. 50 in the real df), which are given in a list:

list_possibilities = ['A+B', 'A+B+D', 'B+D+E+G', 'F+H', 'D+E+F+G+H', 'D', 'F', 'G+H']

As you can see, single columns such as "D" or "F" may also be a possibility. So the final df would be fairly compact, and wouldn't need a brute-force computation of thousands of combinations (here shown only for the first two rows for simplicity):

Date	A+B	A+B+D	B+D+E+G	F+H	D+E+F+G+H	D	F	G+H
2022-01-01 10:00:00	4	8	8	1	6	4	0	1
2022-01-01 11:00:00	1	3	6	7	12	2	2	8

Knowing the prior structure of the combinations allows to exponentially decrease the combinations and sums pandas must do, but I am unable to generalize the previous code of the solution to this case. Any ideas?

score 4 · Accepted Answer · answered Dec 22 '22 at 11:18

Use concat in a dictionary comprehension:

out = (pd
   .concat({cols: df.set_index('Date')[cols.split('+')].sum(axis=1)
            for cols in list_possibilities}, axis=1)
   .reset_index()
)

Output:

                  Date  A+B  A+B+D  B+D+E+G  F+H  D+E+F+G+H  D  F  G+H
0  2022-01-01 10:00:00    4      8        8    1          6  4  0    1
1  2022-01-01 11:00:00    1      3        6    7         12  2  2    8
2  2022-01-01 12:00:00    2      3        3    1          2  1  1    0
3  2022-01-01 13:00:00    0      0        1    5          6  0  3    3
4  2022-01-02 10:00:00    3      4       10    5         12  1  4    5
5  2022-01-02 11:00:00    0      0        5    0          5  0  0    5
6  2022-01-02 12:00:00    0      0        7    7         14  0  4    8
7  2022-01-02 13:00:00    0      1        4    0          4  1  0    0

This is a very elegant and efficient solution. Thank you very much for your help! — datadatadata, Dec 22 '22 at 11:55

Possible combinations of pandas columns from a list

1 Answers1