3

I have a pandas data frame, where datetime is the index of the data frame (I use t=0 for simplification, in fact there is something like 20170101 09:30:00)

datetime    Stock A    Stock B
t=0           5          20
t=1           6          30
t=2           8          25
t=3           4          20

and I would like to return:

datetime    Stock A    Stock B
t=0           100        100
t=1           120        150
t=2           140        125
t=3           80         100

in mathematical terms: Index(i, t) = P(i, t) / P(i, 0).

I tried

df_norm =  df[0:] / df[0:1]
print(df_norm)

which gives me an error.

edit1: I tried option 3 which works fine (couldn't try on NaN's yet, but at least it does not create an NaN for the first obs (which is caused by pctchange)). I wonder also that after performing, my datetime is not the set index anymore, which is easy to fix by just re-assigning it.

Now I am trying now to wrap it in a function, but I think the index is causing a problem (actually same error as with my "first" attempt):

def norming(x):
    return x.assign(**x.drop('datetime', 1).pipe(
    lambda d: d.div(d.shift().bfill()).cumprod()))

edit2: if my column datetime is an index, i.e.

df_norm.set_index(['datetime'], inplace = True)

I'll get an error though, what would I need to change?

eternity1
  • 651
  • 2
  • 15
  • 31

2 Answers2

3

Option 1

df.set_index('datetime').pct_change().fillna(0) \
  .add(1).cumprod().mul(100).reset_index()

  datetime  Stock A  Stock B
0      t=0    100.0    100.0
1      t=1    120.0    150.0
2      t=2    160.0    125.0
3      t=3     80.0    100.0

Option 2

def idx_me(a):
    a = np.asarray(a)
    r = np.append(1, a[1:] / a[:-1])
    return r.cumprod() * 100

df.assign(**df.drop('datetime', 1).apply(idx_me))

  datetime  Stock A  Stock B
0      t=0    100.0    100.0
1      t=1    120.0    150.0
2      t=2    160.0    125.0
3      t=3     80.0    100.0

Option 3

df.assign(**df.drop('datetime', 1).pipe(
    lambda d: d.div(d.shift().bfill()).cumprod().mul(100)))

  datetime  Stock A  Stock B
0      t=0    100.0    100.0
1      t=1    120.0    150.0
2      t=2    160.0    125.0
3      t=3     80.0    100.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
3

Seems like

p=100/df.iloc[0,1:]
df.iloc[:,1:]*=p
df
Out[1413]: 
  datetime StockA StockB
0      t=0    100    100
1      t=1    120    150
2      t=2    160    125
3      t=3     80    100
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Much better answer than mine! – piRSquared Mar 31 '18 at 02:07
  • @piRSquared yours better since he get the return rate as well :-) – BENY Mar 31 '18 at 02:08
  • 1
    My version of this `df.assign(**df.drop('datetime', 1).pipe(lambda d: d.div(d.iloc[0]).mul(100)))` – piRSquared Mar 31 '18 at 02:09
  • @piRSquared this is awesome Sir :-) – BENY Mar 31 '18 at 02:09
  • Thank you both for your answer! In option 1 I see "fillna", I do have NaN in my data from time to time and actually would want to keep it as such. Would that affect any of the other solutions too? Additionally, if not too much of an effort, would you mind to comment on the code? There are specific parts of the code and syntax I do not really understand. Happy Easter! – eternity1 Mar 31 '18 at 03:42
  • Use ffill and mask. If wen gets to it, great otherwise I’ll edit it in when I get to a computer – piRSquared Mar 31 '18 at 03:50