0

I'm writing a moving average function on time series:

def  datedat_moving_mean(datedat,window):

    #window is the average length

    datedatF = pandas.DataFrame(datedat)
    return (datedatF.rolling(window).mean()).values

The code above is copied from Moving Average- Pandas

The I apply this function to this time series:

datedat1 = numpy.array(
[ pandas.date_range(start=datetime.datetime(2015, 1, 30),periods=17),
numpy.random.rand(17)]).T

However, datedat_moving_mean(datedat1,4) just return the original datedat1. It moving averaged nothing! What's wrong?

Harry
  • 299
  • 2
  • 5
  • 14
  • If you do just `datedatF[1].rolling(4).mean()`, it gives you a numeric result. Since you call `rolling.mean` on the entire dataframe, and since the first column is not numeric, I believe it returns the input without doing anything. – cs95 Jan 11 '18 at 06:23
  • 1
    I believe this has to do with the fact that the first column in your dataframe is a column of Timestamp objects (which is different from a datetime column), so pandas silently returns without raising any errors (which it would've done had the column been a datetime one). – cs95 Jan 11 '18 at 06:25
  • Errr...how could I transform `Timestamp` to datetime column? I try `datedat_moving_mean([i.to_pydatetime() for i in datedat1[:,0]],4)`, it still return `ops for Rolling for this dtype datetime64[ns] are not implemented`. – Harry Jan 11 '18 at 06:30
  • `datedatF[1] = pd.to_datetime(datedatF[1])` and it's throw an error instead of incorrect output. – cs95 Jan 11 '18 at 06:31
  • `datedat1F=pandas.DataFrame(datedat1)` and `pandas.to_datetime(datedat1F[1]).values` still return a array with `dtype='datetime64[ns]'`, which cannot used for average... – Harry Jan 11 '18 at 06:45
  • Wrong. `datedat1F=pandas.DataFrame(datedat1); datedatF.iloc[:, 0] = pd.to_datetime(datedatF.iloc[:, 0])` – cs95 Jan 11 '18 at 06:46
  • And then, `df.iloc[:, 1] = df.iloc[:, 1].rolling(4).mean()` – cs95 Jan 11 '18 at 06:46

1 Answers1

1

Your construction of the DataFrame has no index (defaults to ints) and has a column of Timestamp and a column of floats.

I imagine that you want to use the Timestamps as an index, but even if not, you will need to for the purpose of using .rolling() on the frame.

I would suggest that your initialisation of the original DataFrame should be more like this

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.rand(17), index=pd.date_range(start=datetime.datetime(2015, 1, 30),periods=17))

If however you don't, and you are happy to have the dataframe un-indexed, you can work around the rolling issue by temporarily setting the index to the Timestamp column

import pandas as pd
import numpy as np
import datetime

datedat1 = np.array([ pd.date_range(start=datetime.datetime(2015, 1, 30),periods=17),np.random.rand(17)]).T
datedatF = pd.DataFrame(datedat1)

# We can temporarily set the index, compute the rolling mean, and then 
# return the values of the entire DataFrame
vals = datedatF.set_index(0).rolling(5).mean().reset_index().values

return vals

I would suggest however that the DataFrame being created with an index will be better (consider what happens in the event that the datetimes are not sorted and you call rolling on the dataframe?)

emmet02
  • 932
  • 5
  • 8
  • The `rolling(5).mean()` didn't work in your code. `vals` returns the same data as `datedat1`. – Harry Jan 11 '18 at 14:01
  • Apologies, I forgot the .T, have edited now and it should work. – emmet02 Jan 11 '18 at 14:19
  • So the Timestamps aren't averaged in fact? `set_index(0)` protects the first column? – Harry Jan 11 '18 at 14:29
  • 2
    set_index moves the column from the 'dataframe values' into the index of the dataframe. The operation .rolling() applies to columns in the dataframe only. Once the values within the dataframe are numeric only, the .rolling() method can be applied. After we apply it, we then reset the index of the dataframe, 'moving' the timestamps column into the dataframe proper. We then return all values of the dataframe, the timestamps (un-touched by the .rolling()) and the rolling averages of the other values. – emmet02 Jan 11 '18 at 14:34