64

I have an excel sheet that looks like so:

Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

And I'm looking to extract that data, group it by column 1, and add it to a dictionary so it appears like this:

{0: [1],
1: [2,3,5],
2: [1,2],
3: [4,5],
4: [1],
5: [1,2,3]}

This is my code so far

excel = pandas.read_excel(r"e:\test_data.xlsx", sheetname='mySheet', parse_cols'A,C')
myTable = excel.groupby("Column1").groups
print myTable

However, my output looks like this:

{0: [0L], 1: [1L, 2L, 3L], 2: [4L, 5L], 3: [6L, 7L], 4: [8L], 5: [9L, 10L, 11L]}

Thanks!

LondonRob
  • 73,083
  • 37
  • 144
  • 201
SuperDougDougy
  • 701
  • 1
  • 5
  • 10
  • for easy test, I add sample dataframe code. data = { 'col1': [0,1,1,1,2,2,3,3,4,5,5,5], 'col2': np.random.randint(0, 55, 12), 'col3': [1,2,3,5,1,2,4,5,1,1,2,3] } and df = pd.DataFrame(data) – hyun woo Cho Oct 21 '22 at 00:13

2 Answers2

97

You could groupby on Column1 and then take Column3 to apply(list) and call to_dict?

In [81]: df.groupby('Column1')['Column3'].apply(list).to_dict()
Out[81]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}

Or, do

In [433]: {k: list(v) for k, v in df.groupby('Column1')['Column3']}
Out[433]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}
Zero
  • 74,117
  • 18
  • 147
  • 154
46

According to the docs, the GroupBy.groups:

is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.

If you want the values themselves, you can groupby 'Column1' and then call apply and pass the list method to apply to each group.

You can then convert it to a dict as desired:

In [5]:

dict(df.groupby('Column1')['Column3'].apply(list))
Out[5]:
{0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}

(Note: have a look at this SO question for why the numbers are followed by L)

Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562