I'm trying to figure out how to count a given combination of 2 strings regardless of which string is first / second.
Here is my code:
import pandas as pd
mylist = [[('Smith JR', 'Kim YY'), ('Smith JR', 'Ron AA'), ('Kim YY', 'Ron AA')],
[('Kim YY', 'Smith JR')], [('Smith JR', 'Ron AA')]]
flat_list = [item for sublist in mylist for item in sublist]
df = pd.DataFrame(flat_list, columns=["From", "To"])
df_graph = df.groupby(["From", "To"]).size().reset_index()
df_graph.columns = ["From", "To", "Count"]
print(df_graph)
which gives:
From To Count
0 Kim YY Ron AA 1
1 Kim YY Smith JR 1
2 Smith JR Kim YY 1
3 Smith JR Ron AA 2
but since Kim YY Smith JR and Smith JR Kim YY form a connection between the same two people I want it to give:
From To Count
0 Kim YY Ron AA 1
1 Kim YY Smith JR 2
2 Smith JR Ron AA 2
I have seen a number of solutions which remove the duplicated row but don't combine the Counts from each row as I desire. I can't seem to figure out how to combine the
1 Kim YY Smith JR 1
2 Smith JR Kim YY 1
rows such that only the Kim YY - Smith JR row remains and the Count is 2. Also, in my actual data the count can be greater than 1 for a given row.