3

I am new to python and learning it by doing some basic stock data analysis. Below is the dataframe I am using

                      date      open      high  ...       close  volume 
0      2010-01-05 09:16:00   5282.00   5283.10  ...   5281.10  94700    NaN
1      2010-01-05 12:16:00   5281.60   5281.60  ...   5278.30  49100    NaN
2      2010-01-05 16:16:00   5278.50   5280.50  ...   5278.80  62550    NaN
3      2010-01-06 09:16:00   5278.80   5279.45  ...   5277.30  64850    NaN
4      2010-01-06 12:16:00   5277.95   5278.00  ...   5276.00  65251    NaN

As you can see its a timeseries where there are different timeslots within a day. So I want to find the prtc_change (Percentage change) open of 2010-01-06 09:16:00 as compared to the close of 2010-01-05 16:16:00. how would I calculate it?

This is the kind of output I am looking for:

                      date      open      high  ...       close  volume %change
0      2010-01-05 09:16:00   5282.00   5283.10  ...   5281.10  94700    
1      2010-01-05 12:16:00   5281.60   5281.60  ...   5278.30  49100    
2      2010-01-05 16:16:00   5278.50   5280.50  ...   5278.80*  62550    
3      2010-01-06 09:16:00   5278.80*   5279.45  ...   5277.30  64850    0
4      2010-01-06 12:16:00   5277.95   5278.00  ...   5276.00  65251    

The %change column has 0 for 2010-01-05-close to 2010-01-05 9:16-open because the open = close (5278.80 == 5278.80) (marked by *).

Note : I have manipulated the data a little bit as I was working on it . below are the codes

import pandas as pd
import datetime

df = pd.read_csv(r'C:\Users\Admin\Desktop\Python files\nifty.txt' , sep = ';' , names = ["dates","open","high","low","close","volume"])
## fomration the date and time
df['dates'] = pd.to_datetime(df['dates'].astype(str) , format='%Y%m%d %H%M%S' )
## splitting the datetime column into date and time
df['date'] = [d.date() for d in df['dates']]
df['time'] = [d.time() for d in df['dates']]

the current dataframe looks like :

                     dates      open      high  ...  volume        date      time
0      2010-01-05 09:16:00   5282.00   5283.10  ...   94700  2010-01-05  09:16:00
1      2010-01-05 12:16:00   5281.60   5281.60  ...   49100  2010-01-05  12:16:00
2      2010-01-05 16:16:00   5278.50   5280.50  ...   62550  2010-01-05  16:16:00
3      2010-01-06 09:16:00   5278.80   5279.45  ...   64850  2010-01-05  09:16:00
4      2010-01-06 12:16:00   5277.95   5278.00  ...   65251  2010-01-05  12:16:00
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Fudgster
  • 43
  • 1
  • 7
  • Hi welcome to stackoverflow. Do you mean for each day you want to compare earliest 'open' and the latest 'close'? It is always good to add the expected output to your question to facilitate understanding of your problem. – ScootCork Jun 29 '20 at 12:06
  • Well i wanted to get the %change of the latest close of a day and the earliest open of the next day . sorry abt that , ill edit my question with what kind of output is needed – Fudgster Jun 29 '20 at 12:23
  • @Fudgster Are the dates always sorted in your dataframe? – Shubham Sharma Jun 29 '20 at 13:39
  • yes shubham its data of a stock , so its always sorted – Fudgster Jun 29 '20 at 17:34
  • @Fudgster Then an alternative solution should be possible. – Shubham Sharma Jun 29 '20 at 17:36
  • and that would be ? oddly despite defining the date coloumn , vladis solution gives me an error , the comments under his answer i have posted this – Fudgster Jun 29 '20 at 17:53

2 Answers2

4

Pandas has pct_change function, but it computes the percent change between consecutive elements of a source Series, or for each column of numeric type in a source DataFrame.

So in your case it is useless, and you need a different approach:

  1. The first step is to find the first open and last close on each day:

     days = df.groupby(df.date.dt.date).agg({'open': 'first', 'close': 'last'})
    
  2. Then, calculate the percent change:

     100 * (days.open - days.close.shift()) / days.open
    

Details:

  • days.open - the earliest open from the current day.
  • days.close.shift() - the latest close from the previous day.
  • 100 * ... - to express the result as the number of percent.

The second step is to "join" these data with the original DataFrame (create a new column):

  1. Define a function computing %change column for a group of rows for particular day:

     def pctChg(grp):
         rv = pd.Series('', index=grp.index)
         chg = days.pct.asof(grp.iloc[0, 0])
         if pd.notnull(chg): rv.iloc[0] = chg
         return rv
    
  2. Then create the new column:

     df['%change'] = df.groupby(df.date.dt.date)\
         .apply(pctChg).reset_index(level=0, drop=True)
    
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • I think you misunderstood OP's question... he want the percentage change between the prev. day's closing and the current day's opening... – Sabito stands with Ukraine Jun 29 '20 at 12:39
  • thanks , as Yatin said above , logic was a bit off . but thanks for the approach . this approach can be used . i can identify the day close to next day open and then calculate the percentage . – Fudgster Jun 29 '20 at 13:34
  • @vladi , i tied the codes to get the earliest open and earliest close in the dataframe . but ended up with an error . saying AttributeError: 'DatetimeProperties' object has no attribute 'dates' . i have edited my question as I have manipulated the dta a little bit . posting the codes for them too – Fudgster Jun 29 '20 at 17:59
  • When my code was written, you had only *date* column, as I assumed, of *datetime* type (+ some *float* columns). Now you have changed your source data structure, so my code may fail. It is a bad habit to change assumptions and then ask for corrections of the initial solution. This is rather a material for **another** question. – Valdi_Bo Jun 29 '20 at 18:33
  • sorry vlaid , i changed the dataframe back , but oddly got the same issue . next post onwards . ill make sure not to change the format . thanks a ton though – Fudgster Jun 29 '20 at 18:45
3

Use:

df['dates'] = pd.to_datetime(df['dates'])
close = df['close'].shift()
df['% change'] = np.where(
    df['date'].dt.day.diff().gt(0), ((df['open'] - close) / close) * 100, '')

Result:

# print(df)
                 date     open     high   close  volume % change
0 2010-01-05 09:16:00  5282.00  5283.10  5281.1   94700         
1 2010-01-05 12:16:00  5281.60  5281.60  5278.3   49100         
2 2010-01-05 16:16:00  5278.50  5280.50  5278.8   62550         
3 2010-01-06 09:16:00  5278.80  5279.45  5277.3   64850      0.0
4 2010-01-06 12:16:00  5277.95  5278.00  5276.0   65251         
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • Hi Shubham , thanks for the reply , this is to find the %change on every timeslot . i am only looking at when the day changes . ex: 2010-01-05 16:16:00 is the final time slot , so using the close of that and the opening of the next day . – Fudgster Jun 29 '20 at 13:05