How to calculate an average stock price depending on periods

Question

I am trying to calculate the average opening price for a stock, depending on different periods (week, month, year).

Here you can see a part of my df : My dataframe (987 rows for the complete df)

Firstly, I am trying to calculate the average opening price week by week. I found a solution, but it is unsustainable (it took my computer 5min to finish the calculations). Here it is :

def average_opening_and_closing_prices(df):
    
    array = [0]
    n = df["weekofyear"].count()
    j=0

    for i in range(0,n): 

        array[j] = array[j] + kdf["Open"][i]
        if i != n-1 and kdf["weekofyear"][i] != kdf["weekofyear"][i+1]:
            array.append(0)
            j = j+1
    
    for x in array:
      print(str(x) + " ") 
    
average_opening_and_closing_prices(AMAZON_df)

Could you help me to improve my solution (mainly on execution time) ? Also, for example, I would like to add a column, directly to my df, which contains the results for each week, instead of putting the results in an array.

I am not allowed to use pandas, I can only use pyspark and koalas.

You need the average of the "open" column? – jacob galam Dec 16 '20 at 19:19 — jacob galam, Dec 16 '20 at 19:19
yes, I need the average for the "open" column – Ousen92i Dec 16 '20 at 20:15 — Ousen92i, Dec 16 '20 at 20:15

Vijay · Accepted Answer · 2020-12-16T20:55:29.080

2

[UPDATED: To include year into the calculation] As you are looking for average price for week (and year) and already added the weekofyear in data frame, panda's itself can do it for you. Just add a column for year and try df.groupby(['year', 'weekofyear']).mean() Sample below:

import pandas as pd

df = pd.DataFrame({
    'weekofyear' : [1, 1, 1, 2, 2, 2, 3, 3, 3], 
    'year' : [2017, 2017, 2018, 2017, 2017, 2018, 2017, 2017, 2018],
    'Open' : [757, 758, 759, 761, 761, 762, 763, 764, 764]
})

result_df = df.groupby(['year', 'weekofyear']).mean()

print(result_df)

Output

Open
year weekofyear       
2017 1           757.5
     2           761.0
     3           763.5
2018 1           759.0
     2           762.0
     3           764.0

edited Dec 16 '20 at 20:55

answered Dec 16 '20 at 19:37

Vijay

778
4
9

Thanks for your reply ! The problem with your solution is that it mixes weeks from different years. As I said I have 987 rows, wich represent data for 4 years. So when I group by weekofyear, it groups week 1 from 2017, week 1 from 2018, week 1 from 2019. Therefore, the calculations are incorrect. – Ousen92i Dec 16 '20 at 20:11
I have updated the answer, you can add year column in DataFrame and then group by year and weekofyear and you get the result. – Vijay Dec 16 '20 at 20:56
Thanks, it works :) But I have a question. Do you know how can I avoid problem like this : 2019-12-30 and 2019-12-31 are counted as week 1 for 2020. I know it is because week 1 starts with a monday, and the first day of 2020 is not a monday, so it tooks the begining of the week, which belong to 2019 – Ousen92i Dec 16 '20 at 21:27
Because now, when I groupby week and year, I will have the first week 1 for 2019, and a second week 1 which is at the end of 2019, and the will be grouped as one week 1 – Ousen92i Dec 16 '20 at 21:31
1

It's a separate problem then your question, but you can use strftime instead of weekofyear e.x. `df['week_format'] = df['date'].dt.strftime("%U")` instead of `df['weekofyear'] = df['date'].dt.weekofyear` – Vijay Dec 17 '20 at 12:46

jacob galam · Answer 2 · 2020-12-16T20:17:45.760

0

Try that:

from statistics import mean 
average = mean([week["Open"] for week in df["weekofyear"]])

Search about "pythonic" for loop, loop over items and not indexes. https://www.w3schools.com/python/python_for_loops.asp

edited Dec 16 '20 at 20:17

answered Dec 16 '20 at 19:22

jacob galam

781
9
21

Thanks for your reply, but when i tryed your solution, i got the error: PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead. – Ousen92i Dec 16 '20 at 20:16
I guess it is because your solution uses pandas and my df is a koalas df ? – Ousen92i Dec 16 '20 at 20:17
try: `df = df.to_dict()` – jacob galam Dec 16 '20 at 20:19
I did it, i got : TypeError: 'int' object is not subscriptable – Ousen92i Dec 16 '20 at 20:47

How to calculate an average stock price depending on periods

2 Answers2