Let's assume we have a table like:
id chr val1 val2
... A 2 10
... B 4 20
... A 3 30
...and we'd like to have a contingency table like this (grouped by chr
, thus using 'A'
and 'B'
as the row indices and then summing up the values for val1
and val2
):
val1 val2 total
A 5 40 45
B 4 20 24
total 9 60 69
How can we achieve this?
pd.crosstab(index=df.chr, columns=["val1", "val2"])
looked quite promising but it just counts the rows and does not sum up the values.
I have also tried (numerous times) to supply the values manually...
pd.crosstab(
index=df.chr.unique(),
columns=["val1", "val2"],
values=[
df.groupby("chr")["val1"],
df.groupby("chr")["val2"]
],
aggfunc=sum
)
...but this always ends up in shape mismatches and when I tried to reshape via NumPy:
values=np.array([
df.groupby("chr")["val1"].values,
df.groupby("chr")["val2"].values
].reshape(-1, 2)
...crosstab
tells me that it expected 1 value instead of the two given for each row.