5

I have a pandas dataframe containing the below data, and i would like to add a new column which, for each date, returns the most frequently occurring 'weather_type' over the previous 3 days. Where the result is a tie, i'd like the most recent 'weather_type' to be returned.

d = {'date': ['17/02/2017', '18/02/2017', '19/02/2017', '20/02/2017', 
'21/02/2017', '22/02/2017'], 'precipmm': [1, 0, 3, 2, 7, 8], 'weather_type':
['rain', 'sun', 'rain', 'snow', 'snow', 'rain']}
df = pd.DataFrame(data=d)
df['date']=pd.to_datetime(df['date'], format='%d/%m/%Y')
df['rollingsum_precipmm']=df['precipmm'].rolling(window=3).sum()

I've already managed to create a new column containing the sum of the total 'precipmm' over the last 3 days using the below:

df['rollingsum_precipmm']=df['precipmm'].rolling(window=3).sum()

I suspect the answer revolves around this, but as yet i've been unable to find a solution.

Any help much appreciated as always

Colin

Colin Blyth
  • 83
  • 1
  • 7
  • 1
    add example df in code please – ivan7707 Feb 28 '18 at 18:32
  • `Where the result is a tie, i'd like the most recent 'weather_type' to be returned.` is this to be returned to another column? – TYZ Feb 28 '18 at 18:47
  • no the same column please – Colin Blyth Feb 28 '18 at 18:49
  • 1
    @ColinBlyth It's not quite reasonable to have different data types in the same column, and this usually makes further analysis and processing difficult. Is there a reason that you want to put them in the same column? – TYZ Feb 28 '18 at 19:23

2 Answers2

9

To get rolling mode, you can do:

from scipy.stats import mode
df['precipmm'].rolling(window=7).apply(lambda x: mode(x)[0])
TYZ
  • 8,466
  • 5
  • 29
  • 60
  • 1
    https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mode.html – Stop harming Monica Feb 28 '18 at 18:54
  • i've tried df['most_common']=df['weather_type'].rolling(window=3).apply(lambda x: mode(x)[0]) but it doesnt appear to work. the value on the final row of this new column is 'rain' but it should be 'snow' – Colin Blyth Feb 28 '18 at 19:09
  • @ColinBlyth This piece of code only produces the rolling mode, I haven't figured out the second part and I replied to you regarding that in your question about the "why" you want to do that. – TYZ Feb 28 '18 at 19:24
  • Sorry about that I misunderstood the question, I would like this result to appear in a new column – Colin Blyth Feb 28 '18 at 19:56
  • @Goyo That is not applicable to Rolling object, it's only for Series – TYZ Feb 28 '18 at 20:17
  • @YilunZhang I thought it would be easy to use it to remove the dependency on`scipy.stats` but after a closer look it does not look so easy. – Stop harming Monica Feb 28 '18 at 20:56
1

For the result to be appeared in a new column:

df=df.assign(new_column=df['precipmm'].rolling(window=7).apply(lambda x: mode(x)[0]))
imbr
  • 6,226
  • 4
  • 53
  • 65
Ari
  • 11
  • 1