1

I have a multi-index based on dates and times (grouped by times). Data looks like this:

                             r2     var_v2x          _s
date       time                                        
2017-10-02 08:05:00  0.00000031  0.00008784  0.00353914
2017-10-03 08:05:00  0.00000008  0.00009055  0.00085233
2017-10-04 08:05:00  0.00000000  0.00018774  0.00000000
2017-10-05 08:05:00  0.00000031  0.00007609  0.00409934

and for another time period:

                             r2     var_v2x          _s
date       time                                        
2017-10-02 08:30:00  0.00000000  0.00008784  0.00000000
2017-10-03 08:30:00  0.00000008  0.00009055  0.00085375
2017-10-04 08:30:00  0.00000008  0.00018774  0.00041180
2017-10-05 08:30:00  0.00000000  0.00007609  0.00000000
2017-10-10 08:30:00  0.00000008  0.00006900  0.00112288

Now I want to apply a rolling mean to column _s with a self-extendable rolling window. This means for the first observation within a group the rolling window is 1 (so just the value, 2nd observation includes 1st and 2nd value of column _s and so on.

In the end, I want to have the ungrouped dataframe like:

                             r2     var_v2x          _s      s_rolling
date       time                                        
2017-10-02 08:05:00  0.00000031  0.00008784  0.00353914     0.00353914
2017-10-02 08:10:00  0.00000024  0.00008784  0.00249531     rolling(2)
2017-10-02 08:15:00  0.00000024  0.00008784  0.00249531     rolling(3)
2017-10-02 08:20:00  0.00000024  0.00008784  0.00249531     rolling(4)

So far I have this:

`file_name = r'E:\Model\ModelSpecific\Index_shat.txt'

 df = pd.read_csv(file_name, parse_dates=[0], index_col=None, sep=',')

 list_date = sorted(set(df['Date']))
 list_time = sorted(set(df['Time']))
 iterables = [list_date, list_time]
 indexed = pd.MultiIndex.from_product(iterables, names=['date', 'time'])
 df = df.set_index(indexed)

 df = df.dropna()
 df['_s'] = df['r2']/df['var_v2x']

 for date, new_df in enumerate(df.groupby(level=1)):
      new_df = pd.DataFrame(new_df)
      new_df['rolling_s'] = new_df.expanding().mean()
      print(new_df)

But returns: ValueError: Wrong number of items passed 3, placement implies 1.

I also tried enumerate(df.groupby(level=1)):
But returns: AttributeError: 'tuple' object has no attribute 'rolling'

Maeaex1
  • 703
  • 7
  • 36
  • 2
    You want `.expanding().mean()`. but not sure exactly what the code is since your input isn't complete enough to create the output. – ALollz Sep 21 '18 at 16:43
  • @ALollz added my full code. Tried `.expanding().mean()` but it returns: AttributeError: 'tuple' object has no attribute 'rolling'. I then tried to transform the tuple into a `DataFrame`but this returns: TypeError: DataFrame constructor called with incompatible data and dtype: setting an array element with a sequence – Maeaex1 Sep 21 '18 at 17:04
  • 1
    You don't need enumerate. just `for date, new_df in df.groupby():`, then `new_df` is your `DataFrame` – ALollz Sep 21 '18 at 17:11

1 Answers1

0
df_needed = df['_s']
df_needed.rename(columns = {'_s':'s_hat'}, inplace=True)

lista = []

for date, new_df in df_needed.groupby(level=1):
    new_df = pd.DataFrame(new_df)
    rolling = new_df[0].expanding().mean()
    lista.append(rolling)

df_init = pd.concat(lista,axis=0)

df_joined = df.join(df_init)
df_joined.columns = ['r2', 'var_v2x', 'r2_var_2x', 's_hat']

df_joined.to_csv(outfile)

Final code and works well! Thanks!

Maeaex1
  • 703
  • 7
  • 36