1

I have a pandas DataFrame like following (but with 1,000 different IDs):

df1 = pd.DataFrame({'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4,4,5,5],
                   'VALUE': ['first', 'second', 'third',
                        'second', 'second', 'first', 'fourth',
                        'first', 'second', 'first',
                        'third', 'third', 'third', 'first',
                        'second', 'first']})

I want to get the first row of each group but keeping the ID:

    ID  VALUE
0   1   first
1   1   second
2   1   third
3   2   second
4   2   second
5   2   first
6   2   fourth
7   3   first
8   3   second
9   3   first
10  4   third
11  4   third
12  4   third
13  4   first
14  5   second
15  5   first

Expected Outcome:

ID  VALUE
0   1   first
1   1   first
2   1   first
3   2   second
4   2   second
5   2   second
6   2   second
7   3   first
8   3   first
9   3   first
10  4   third
11  4   third
12  4   third
13  4   third
14  5   second
15  5   second

I tried using df1.gropupby('ID').first() but it won't let me create a new variable with the expected outputand include it in df1 because operands could not be broadcast together with different shapes.

Doni
  • 19
  • 1
  • 3
    use `transform` e.g.: `df1['VALUE'] = df1.groupby('ID').transform('first')` – David Erickson Apr 09 '21 at 18:18
  • 1
    1. `groupby` = consolidated dataframe 2. `groupby` + `transform` = "Caclulated column" added to existing dataframe. with just a `groupby`, you cannot add the resulting series (with a different and shorter index) to an existing dataframe (with a longer and different index). When adding a series to the dataframe, the index must be EXACTLY the same. `transform` allows you to keep the exact index of the new series, so that it can be added to the dataframe. That is the source of your error `because operands could not be broadcast together with different shapes.`. – David Erickson Apr 09 '21 at 18:24

0 Answers0