I am trying to achieve something similar to: GroupBy results to dictionary of lists.
Column1 Column2 Column3
0 23 1
1 5 2
1 2 3
1 19 5
2 56 1
2 22 2
3 2 4
3 14 5
4 59 1
5 44 1
5 1 2
5 87 3
sdf.groupby('Column1')['Column3'].apply(list).to_dict()
works perfectly.
However, I need to get list of tuples of multiple columns, something like:
sdf.groupby('Column1')['Column2', 'Column3'].apply(list).to_dict()
to get an output like:
{0: [(23, 1)],
1: [(5,2), (2,3), (19,5)],
...}
which returns the headers instead of the values.
below is my workaround solution (which seems to me too much work to get this outcome):
def get_dict_of_set_from_df(df: pd.DataFrame, key_cols: list, val_cols: list) -> dict:
"""
Generic method to create Dict[key_cols] = set(val_cols)
:param df:
:param key_cols:
:param val_cols:
:return:
"""
# df.groupby(key_cols)[val_cols].apply(set).to_dict()
cols = key_cols + val_cols
len_key = len(key_cols)
len_val = len(val_cols)
# get all relevant columns (key_cols and val_cols) from the dataframe
l_ = df[cols].values.tolist()
dc = defaultdict(set)
for c in l_:
# if key or val is a singleton, then do not put into tuple
k = tuple(c[:len_key]) if len_key > 1 else c[:len_key][0]
v = tuple(c[len_key:]) if len_val > 1 else c[len_key:][0]
dc[k].add(v)
return dc