-1

so I have a dataframe and I made this operation:

df1 = df1.groupby(['trip_departure_date']).agg(occ = ('occ', 'mean'))

The problem is that when I try to plot, it gives me an error and it says that trip_departure_date doesn't exist!

I did this:

df1.plot(x = 'trip_departure_date', y = 'occ', figsize = (8,5), color = 'purple')

and I get this error:

KeyError: 'trip_departure_date'

Please help!

  • 1
    That information is now the `index` of your DataFrame and `DataFrame.plot` can only reference columns. `df1.groupby(['trip_departure_date'], as_index=False)` should solve it, or just `reset_index()` after the `groupby` – ALollz May 03 '22 at 16:40
  • the problem is that now, `trip_departure_date` is not a column anymore! – Mateo Guajardo May 03 '22 at 16:46

1 Answers1

2

Your question is similar to this question: pandas groupby without turning grouped by column into index

When you group by a column, the column you group by ceases to be a column, and is instead the index of the resulting operation. The index is not a column, it is an index. If you set as_index=False, pandas keeps the column over which you are grouping as a column, instead of moving it to the index.

The second problem is the .agg() function is also aggregating occ over trip_departure_date, and moving trip_departure_date to an index. You don't need this second function to get the mean of occ grouped by trip_departure_date.

import pandas as pd

df1 = pd.read_csv("trip_departures.txt")

enter image description here

df1_agg = df1.groupby(['trip_departure_date'],as_index=False).mean()

Or if you only want to aggregate the occ column:

df1_agg = df1.groupby(['trip_departure_date'],as_index=False)['occ'].mean()

enter image description here

df1_agg.plot(x = 'trip_departure_date', y = 'occ', figsize = (8,5), color = 'purple')

enter image description here

K. Thorspear
  • 473
  • 3
  • 12
  • thanks, you are right.. the problem is that when I use `as_index = False`, now my only columns are the index and the `occ` column. The `trip_departure_date` column now does not exist. I am trying to aggregate it, but since it is a string column there is no aggregate function that helps me with this. – Mateo Guajardo May 03 '22 at 17:37
  • Oh, I see what the problem is. I've updated my answer to solve. – K. Thorspear May 03 '22 at 18:22