0

Consider the following data frame with a timestamp index which may have repeated (i.e. non-unique) index values, another column that indicates the asset, and another column with the value of that asset at that timestamp.

df
                     value asset
2021-03-18 11:00:00      4     A
2021-03-18 11:30:00      1     B
2021-03-18 12:00:00      3     A
2021-03-18 12:30:00      2     A
2021-03-18 13:00:00      3     A
2021-03-18 13:30:00      3     A
2021-03-18 14:00:00      1     A
2021-03-18 14:30:00      2     B

For each day, I would like to get the final value of each asset and subtract that from the value in each row, per asset. So in the above table, the last daily value for asset A is 1 (at time 2021-03-18 14:00:00), and for B is 2 (at time 2021-03-18 14:30:00). I would then like to deduct these values from the respective asset value in each row. So in the first row I want to calculate new_value to equal 4-1 = 3, and for the second row to be 1-2 = -1.

How can I do that, taking into account that some index values may be repeated since they represent the time at which each asset was traded, and two assets may be traded at the same time.

finstats
  • 1,349
  • 4
  • 19
  • 31

1 Answers1

0

You can use a groupby/transform with the "last" function:

df["new_value"] = df["value"] - df.groupby("asset")["value"].transform("last")

print(df)
                     value asset  new_value
2021-03-18 11:00:00      4     A          3
2021-03-18 11:30:00      1     B         -1
2021-03-18 12:00:00      3     A          2
2021-03-18 12:30:00      2     A          1
2021-03-18 13:00:00      3     A          2
2021-03-18 13:30:00      3     A          2
2021-03-18 14:00:00      1     A          0
2021-03-18 14:30:00      2     B          0

The groupby/transform operation is simply grouping our values by "asset" and getting the "last" element from each group. Then it maps those values back to their original groups- so every element within that group becomes the "last" element. From there we di simple subtraction between 2 arrays that have the same shape.

s = df.groupby("asset")["value"].transform("last")

print(s)

2021-03-18 11:00:00    1
2021-03-18 11:30:00    2
2021-03-18 12:00:00    1
2021-03-18 12:30:00    1
2021-03-18 13:00:00    1
2021-03-18 13:30:00    1
2021-03-18 14:00:00    1
2021-03-18 14:30:00    2
Name: value, dtype: int64
Cameron Riddell
  • 10,942
  • 9
  • 19
  • To factor in possible variance in date. Maybe we could `df.groupby(['asset',df.index.date])['value'].transform('last')` – wwnde Mar 18 '21 at 22:53
  • Don't want to groupby date because then we'll turn each individual row into a group. Though to address possible ordering artifacts, a call to `.sort_index()` before `.groupby` should do the trick, since record order within each group is preserved. – Cameron Riddell Mar 18 '21 at 23:46