I would like to partially "collapse" a DataFrame
/matrix and keep the structure intact by just summing the condensed values. For example, I have this:
CHROM POS GENE DESC JOE FRED BILLY SUSAN TONY
10 1442 LOXL4 bad 1 0 0 1 0
10 335 LOXL4 bad 1 0 0 0 0
10 3438 LOXL4 good 0 0 1 0 0
10 4819 PYROXD2 bad 0 1 0 0 0
10 4829 PYROXD2 bad 0 1 0 1 0
10 9851 HPS1 good 1 0 0 0 0
The first 4 columns are descriptors, and the last 4 columns are people/observations. The end goal is to count how many total "good" and "bad" observations per GENE per person. Thus, I want this:
GENE DESC JOE FRED BILLY SUSAN TONY
LOXL4 bad 2 0 0 1 0
LOXL4 good 0 0 1 0 0
PYROXD2 bad 0 2 0 1 0
HPS1 good 1 0 0 0 0
The following code collapses all the individual observations (Joe, Fred, etc), how can I keep them separate? I would also like to be flexible enough to accommodate a more individuals in the future (keeping the same 4 descriptor columns)
mytable.groupby(['GENE','DESC']).size()