I'm trying to take a series of tweets and group them into 1 hour intervals based on when the tweets were created, and sum the likes the tweets got for each 1 hour interval.
The tweets have been converted to a pandas dataframe, eg:
df.head(1)
author_id username author_followers author_tweets author_description author_location text created_at lang tweet_id retweets replies likes quotes
0 2395138046 WorldCoinIndex 12832 46121 Cryptocurrency index | prices | 24hr volume | ... None Cryptocurrencies $ETH $LTC $DASH $XMR $ZCASH h... 2022-02-11 23:59:38+00:00 en 1492287240990507009 0 1 0 0
EXPECTATION
The code i'm applying to the above dataframe:
df.likes.resample('H', on='created_at').sum()
My understanding is likes
specifies the column to be summed, 'H' specifies the 1 hour time intervals, and the on
parameter defines the time series key created_at
. based on the time series key parameter created_at
.
RESULTING ERROR MESSAGE
KeyError: 'The grouper name created_at is not found'
ASSESSMENT
When I search that error message, I see mostly references for the groupby
method, which I considered, but figured Time Series would be simpler.
Shouldn't it return an index error if it's the 'created_at' parameter that's problematic?