What's the meaning of Python error message `KeyError: 'The grouper name created_at is not found'`?

Question

I'm trying to take a series of tweets and group them into 1 hour intervals based on when the tweets were created, and sum the likes the tweets got for each 1 hour interval.

The tweets have been converted to a pandas dataframe, eg:

df.head(1)
    author_id   username    author_followers    author_tweets   author_description  author_location text    created_at  lang    tweet_id    retweets    replies likes   quotes
0   2395138046  WorldCoinIndex  12832   46121   Cryptocurrency index | prices | 24hr volume | ...   None    Cryptocurrencies $ETH $LTC $DASH $XMR $ZCASH h...   2022-02-11 23:59:38+00:00   en  1492287240990507009 0   1   0   0

EXPECTATION

The code i'm applying to the above dataframe:

df.likes.resample('H', on='created_at').sum()

My understanding is likes specifies the column to be summed, 'H' specifies the 1 hour time intervals, and the on parameter defines the time series key created_at. based on the time series key parameter created_at.

RESULTING ERROR MESSAGE

KeyError: 'The grouper name created_at is not found'

ASSESSMENT

When I search that error message, I see mostly references for the groupby method, which I considered, but figured Time Series would be simpler.

Shouldn't it return an index error if it's the 'created_at' parameter that's problematic?

keramat · Answer 1 · 2022-02-13T05:23:55.897

1

Based on documentation:

on str, optional For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.

When you use the .likes actually there is no column with the specified name on the column you try to sample over. Example:

index = pd.date_range('1/1/2000', periods=9, freq='T')
df = pd.DataFrame({'likes': range(9), 'user':['ali' for i in range(9)]}, index=index)
df['create on'] = df.index

This produce the error:

df.likes.resample('3T', on = 'create on').sum()

And the right way:

df.resample('3T', on = 'create on').sum()

the output:

edited Feb 13 '22 at 05:23

answered Feb 13 '22 at 05:13

keramat

4,328
6
25
38

Something like `df.resample('3T', on='created_at')['likes'].sum()` is more likely what OP is looking for. As without specifying the column (or columns) pandas will try apply `sum` to all columns in the DataFrame which is likely to fail given all of the different column types in the shown sample data. – Henry Ecker Feb 13 '22 at 06:45
@HenryEcker it sums numeric only data, why would it fail? In fact \@keramat's data has nonnumerics and it works – Feb 13 '22 at 07:39
@does it matter, he means with main question example. – keramat Feb 13 '22 at 07:42
@keramat doesnT matter – Feb 13 '22 at 07:43
@does it matter, You are right, but I prefer to restrict the result as much as possible. – keramat Feb 13 '22 at 07:45
@keramat yeah you're not doing that currently. my objection is to failure not to what to keep and not. – Feb 13 '22 at 07:46

What's the meaning of Python error message `KeyError: 'The grouper name created_at is not found'`?

1 Answers1