3

I have a dataframe like this:

file:
      | FIRST | LAST | ID |
---------------------------
0      "ABC"     12    35 
1      "ABC"     14    35
2      "AB"      15    36

Now, what I want is:

file:
      | FIRST | LAST  | ID |
---------------------------
0      "ABC"   [12,14]  35 
2      "AB"      15     36

For this problem let's assume that if ID of two rows is equal then all the values except LAST is also equal.

Therefore, replace all the value except the values of last, which are added to a list.

I tried using solution given in this link: Pandas DataFrame - Combining one column's values with same index into list

I used this:

file = file.groupby('ID')

file = file['Last'].unique()

This is the output I got:

ID
35    [12, 14]
36        [15]
Name: Last, dtype: object

Probably, I am missing something in the groupby().

Thanks in advance :)

UPDATE:

My original Dataframe has more than 100 columns. if ID of two rows is equal then all the values except LAST is also equal.

2 Answers2

2

Is this what you want?

df.groupby(['FIRST', 'ID']).LAST.apply(lambda x: x.tolist()).reset_index()

    FIRST   ID  LAST
0   AB      36  [15]
1   ABC     35  [12, 14]
Vaishali
  • 37,545
  • 5
  • 58
  • 86
2

Given that only the last two rows are different for a given ID, just take the first value when applying a groupby to them. For the column `LAST', use its value or convert it to a list of unique items if there are more than one.

grouping_cols = ['ID', ...]
agg_cols = {col: 'first' for col in df if col not in grouping_cols}
agg_cols['LAST'] = lambda x: x.unique().tolist() if len(x) > 1 else x.iat[0]
>>> df.groupby(grouping_cols, as_index=False).agg(agg_cols)
  ID      LAST FIRST
0  35  [12, 14]   ABC
1  36        15    AB
Alexander
  • 105,104
  • 32
  • 201
  • 196