1
'''
groupby row, concat list
'''
d = {'col1': [33, 33, 33, 34, 34, 34], 'col2': ["hello", "hello1", "hello2", "hello3", "hello4", "hello5"],
     'col3': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data=d)

dfQ = df.groupby('col1')['col2'].apply(list).reset_index()
print(dfQ)

The code above give me the col1 and col2, how to display the result of col3 with col1 and col2?

william007
  • 17,375
  • 25
  • 118
  • 194

2 Answers2

3

IIUC you can use groupby.agg

df1 = df.groupby('col1', as_index=False).agg(list)

print (df1)

   col1                  col2           col3
0   33  [hello, hello1, hello2]     [1, 2, 3]
1   34  [hello3, hello4, hello5]    [4, 5, 6]
Abhi
  • 4,068
  • 1
  • 16
  • 29
  • Thanks, what is the different of using `agg` and using `apply` as in my method? – william007 Oct 19 '18 at 07:42
  • @william007 [`agg`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html) aggregates each column for each group, so your function `(list)` will be applied for each column per group. And agg also gives the flexibility of applying different function for each column. Where as [`apply`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.apply.html)applies the function for each group. You can read the [`groupby docs`](http://pandas.pydata.org/pandas-docs/stable/groupby.html) they are really helpful. – Abhi Oct 19 '18 at 16:04
1

You can use agg with a lambda function to list both your columns.

dfQ = df.groupby('col1').agg(lambda x: list(x)).reset_index()
panktijk
  • 1,574
  • 8
  • 10