I would like to count the number of all possible pairwise relations through a column (Value
) based on another column (ID
).
Example dataframe:
ID Value
0 1 A
1 1 A
2 1 A
3 1 B
4 1 C
5 2 B
6 2 C
7 2 C
To generate example dataframe:
import pandas as pd
df = pd.DataFrame({'ID' : {0: 1, 1: 1, 2: 1, 3: 1, 4: 1,
5: 2, 6: 2, 7: 2},
'Value' : {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'C',
5: 'B', 6: 'C', 7: 'C'}
})
Pairwise count should be performed for ID=1
and ID=2
.
Possible pairwises where ID=1
(A,A), (A,A), (A,B), (A,C),
(A,A), (A,A), (A,B), (A,C),
(A,A), (A,A), (A,B), (A,C),
(B,A), (B,A), (B,A), (B,C),
(C,A), (C,A), (C,A), (C,B),
Possible pairwises where ID=2
(B,C), (B,C)
(C,B), (C,C)
(C,B), (C,C)
Expected dataframe:
A B C
A 6 3 3
B 3 0 3
C 3 3 2
What I have currently got (see below relationship with other stackoverflow question):
df = pd.merge(df, df, on='ID')
df = pd.crosstab(df['Value_x'], df['Value_y']).rename_axis(None).rename_axis(None, axis=1)
print (df)
The wrong output:
A B C
A 9 3 3
B 3 2 3
C 3 3 5
As you might spot that the issue is mainly related with diaoganal side. I assumed that I have to focus on merge side to handle the proposed scenario. However, I could not handle it so far :( Any suggestions ? Thanks in advance!
Related question: There are various similarities with that question. However that question might have slightly wrong expectaions. The case of (A, A) = 0, (B,B) = 0, (C,C) = 0 should be 0 because they are not exist in both case (ID=1 or ID=2) based on that question. If we want to figure out counting only those conditions > AB, AC, BA, BC, CA, CB (from ID=1) and BC, CB (from ID=2) for that question. On the other hand, main difference here is on the diagonal side.