0

EDIT SUMMARY So I have: p1 dataframe with the prices and the returns of n-financial instrument of the last 5 years:

       Date         a_price   b_price   a_ret    b_ret
  0  2018-04-13      6.335     5.114    0.0047    0.01
         .
         .
         .
1272 2023-04-13      11.525     5.708    0.0039   -0.04

Then I have another dataframe seed, which contains only one date and the Standard deviations of the same n-instruments

      Date         a_SD    b_SD
 63  2023-04-13   0.019   0.017

In p1 I need to create n-new columns EWMA as follows: for the oldest date in p1 (2018-04-13), a_EWMA = X*(seed[a_SD]**2) + (1-X) *(p1[a_Ret]*2) for any other date a_EWMA = X(p1.loc[i-1,a_EWMA]**2) + (1-X) *(p1[a_Ret]**2).

The desired output is p1 as follows:

  Date            a_price   b_price   a_ret    b_ret   a_EWMA   b_EWMA
 0   2018-04-13    6.335     5.114    0.0047    0.01     ...     ...
         .
         .
         .
1272 2023-04-13   11.525     5.708    0.0039   -0.04     ...     ...

Since I need to automatize, I'm trying to use f'{col} to match the 2 dataframes but I think I'm not doing it correctly. I'm doing it with the below function but it gives me all None results and worst it's creating too many columns while it should create only n-more (2 in this example)

def EWMA(x):
    for i in p1.index:
        if p1.loc[i,'Date'] == min_Date and '_SD' in seed.columns:
            return math.sqrt((lam*(seed[[f'{col}_SD'for col in seed.columns]])**2) + (1-lam) * (p1.loc[i,[f'{col}_Ret' for col in p1.columns]]**2))
        elif p1.loc[i,'Date'] != min_Date and '_SD' in seed.columns:
            return math.sqrt(lam*((p1.loc[i-1,[f'{col}_EWMA'for col in p1.columns]])**2)+(1-lam)*(p1.loc[i,[f'{col}_Ret'for col in p1.columns]]**2)) if f'_Ret' in p1.columns:
    p1[[f'{col}_EWMA'for col in p1.columns]] = p1.apply(EWMA)
CristinaK
  • 23
  • 3
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Сергей Кох Apr 15 '23 at 19:38
  • I hope it is clearer now. – CristinaK Apr 15 '23 at 21:26
  • Please read [mre] with a focus on 'Minimal'. – MatBailie Apr 15 '23 at 21:29
  • I think the problem is more or less clear, but since the answer will rely very much on the data you're working with, it's hard to come up with a correct answer. What the others are asking for is if you could provide a simpler bit of code that shows the problem you're trying to solve with some example data, so that the needed operation can be clearly understood and explained. You should then be able to apply the solution we can help find to the original problem as well. – Grismar Apr 15 '23 at 21:31
  • Where you say " if the Date is the lowest one in the dataframe (p1)", do you mean "... in the dataframe *so far*" or does the condition only apply when the actual minimum for the entire dataframe is encountered? Is the data sorted by date in both dataframes? Are all timestamps in `seed` in `p1`, and vice versa? – Grismar Apr 15 '23 at 21:37
  • Hello :) I wrote a short summary with the function that is actually the problematic one (EWMA). @MatBailie – CristinaK Apr 16 '23 at 15:28
  • p1 dataframe contains the last 5 years of information of the instrument I select from yahoo finance while seed contains only the Standard dev computed. The date hilighted in seed is equal to the lowest date of p1. @Grismar – CristinaK Apr 16 '23 at 15:32
  • It seems like you've substantially changed the question - in a sense it feels like you're asking for something very different now. The data you provided helps a little, but you're still not showing how the data relates, as you only provide the first and last line of the input data (twice, since the desired result just shows where you want the new columns, no what their values should be). Are you aware that `pandas.DataFrame` has a `ewm` method? Is there a reason you're trying to define your own? What is the significance of 'price' in your data, since you don't use it at all? – Grismar Apr 16 '23 at 23:27
  • Also for the initial, you have `X*(seed[a_SD]**2) + (1-X) *(p1[a_Ret]*2)` but that's probably supposed to be `**2` instead of `*2`? Is there a specific reason you're squaring values here? (basic EWMA calculation would just follow EWMA[t]=λY[t]+(1−λ)EWMA[t−1]?) – Grismar Apr 17 '23 at 00:31
  • It's also still unclear what the relevance of the date is in your seed data? – Grismar Apr 17 '23 at 00:41

1 Answers1

0

Your question initially mentioned something about the dates in your data being relevant, and your seed values dataframe still contains a date that doesn't seem relevant to the solution. So, I've ignored the dates, assuming the source data has one record for each day and the date in the seed values is immaterial (as the source data starts at that date anyway).

With that in mind:

import datetime, random
import pandas as pd
import numpy as np

# random example data, structured like yours
dates = pd.date_range(datetime.date(2018, 4, 13), datetime.date(2023, 4, 13), freq='D')
df = pd.DataFrame(
    np.hstack([np.random.rand(len(dates), 2) * 10, np.random.rand(len(dates), 2)]),
    index=dates, columns=['a_price', 'b_price', 'a_ret', 'b_ret']
)
print(df)

# seed values (apparently standard deviations), date did not seem relevant to the example
seed_df = pd.DataFrame([{'a_SD': 0.019, 'b_SD': 0.017}])
print(seed_df)


def ewma(x, seed, ys):
    # generic EWMA with some seed value
    v = seed
    return [v := x*y + (1-x)*v for y in ys]


for col in seed_df.columns:
    df[f'{col[:-3]}_ewma'] = ewma(
        0.5,  # an arbitrary 'X', or lambda value for the EWMA
        seed_df[col][0]**2,  # the seed value for the column, getting [0] because it's a series, squared
        df[f'{col[:-3]}_ret']**2  # the column of returns squared, as you want them squared
    )
print(df)

This results in what I think you're asking for. If the dates were somehow relevant to your initial question, you should update the question to be more clear about their meaning.

Also, if your input data is not as straightforward as the example data I generate here, and that is causing problems, you should be more specific about how it might differ.

The example here squares the seed value and the variable for which the EWMA column is created, since you specified that is what you needed in the question - the ewma() function applies regardless.

Example output (random data):

             a_price   b_price     a_ret     b_ret
2018-04-13  4.888440  8.198954  0.395782  0.173080
2018-04-14  2.594048  1.365598  0.866661  0.096559
2018-04-15  9.906080  1.330396  0.071854  0.141243
2018-04-16  2.928997  1.552826  0.942410  0.969734
2018-04-17  9.558887  7.987781  0.861112  0.748358
...              ...       ...       ...       ...
2023-04-09  0.614546  0.671281  0.717163  0.064154
2023-04-10  2.644615  9.685066  0.286565  0.161650
2023-04-11  0.905363  2.342287  0.565717  0.784615
2023-04-12  0.301656  2.447690  0.936617  0.251708
2023-04-13  7.971918  3.588137  0.781405  0.855766

[1827 rows x 4 columns]
    a_SD   b_SD
0  0.019  0.017
             a_price   b_price     a_ret     b_ret    a_ewma    b_ewma
2018-04-13  4.888440  8.198954  0.395782  0.173080  0.078502  0.015123
2018-04-14  2.594048  1.365598  0.866661  0.096559  0.414802  0.012223
2018-04-15  9.906080  1.330396  0.071854  0.141243  0.209982  0.016086
2018-04-16  2.928997  1.552826  0.942410  0.969734  0.549059  0.478235
2018-04-17  9.558887  7.987781  0.861112  0.748358  0.645286  0.519137
...              ...       ...       ...       ...       ...       ...
2023-04-09  0.614546  0.671281  0.717163  0.064154  0.320588  0.197610
2023-04-10  2.644615  9.685066  0.286565  0.161650  0.201354  0.111870
2023-04-11  0.905363  2.342287  0.565717  0.784615  0.260695  0.363746
2023-04-12  0.301656  2.447690  0.936617  0.251708  0.568973  0.213551
2023-04-13  7.971918  3.588137  0.781405  0.855766  0.589784  0.472944

[1827 rows x 6 columns]
Grismar
  • 27,561
  • 4
  • 31
  • 54