-3

I tried to combine rows with apply function in dataframe but couldn't. I would like to combine rows to one list if column (c1+c2) information is same.

for example

Dataframe df1
         c1  c2  c3
    0    0   x  {'a':1 ,'b':2}
    1    0   x  {'a':3 ,'b':4}
    2    0   y  {'a':5 ,'b':6}
    3    0   y  {'a':7 ,'b':8}
    4    2   x  {'a':9 ,'b':10}
    5    2   x  {'a':11 ,'b':12}

expected result

Dataframe df1
        c1   c2  c3
    0    0   x  [{'a':1 ,'b':2},{'a':3 ,'b':4}]
    1    0   y  [{'a':5 ,'b':6},{'a':7 ,'b':8}]
    2    2   z  [{'a':9 ,'b':10},{'a':11,'b':12}]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
hyon
  • 349
  • 1
  • 3
  • 9
  • did you check this http://stackoverflow.com/questions/39954668/how-to-convert-column-with-list-of-values-into-rows-in-pandas-dataframe? – plasmon360 May 02 '17 at 17:04
  • you should specify (and tag) pandas or R or whatever you are using – greggo May 02 '17 at 17:04

1 Answers1

1

Source Pandas DF:

In [20]: df
Out[20]:
   c1 c2                  c3
0   0  x    {'a': 1, 'b': 2}
1   0  x    {'a': 3, 'b': 4}
2   0  y    {'a': 5, 'b': 6}
3   0  y    {'a': 7, 'b': 8}
4   2  x   {'a': 9, 'b': 10}
5   2  x  {'a': 11, 'b': 12}

Solution:

In [21]: df.groupby(['c1','c2'])['c3'].apply(list).to_frame('c3').reset_index()
Out[21]:
   c1 c2                                       c3
0   0  x     [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
1   0  y     [{'a': 5, 'b': 6}, {'a': 7, 'b': 8}]
2   2  x  [{'a': 9, 'b': 10}, {'a': 11, 'b': 12}]

NOTE: I would recommend you to avoid using non-scalar values in Pandas DFs cells - this might cause various difficulties and performance issues

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419