The title may not be the most informative.
I have the following working code I want to vectorize [no for loops] using native pandas.
Basically, it should return for each row its cumulative offset from 0
, given the length of each segment, and a relative offset within that segment.
import pandas as pd
import numpy as np
df = pd.DataFrame({"id": [0, 1, 2, 2, 2, 3, 3, 4, 5, 6, 6, 7, 9], # notice no 8
"length": [0, 10, 20, 20, 20, 30, 30, 40, 50, 60, 60, 70, 90],
"offset": [0, 0, 1, 3, 4, 0, 7, 0, 0, 0, 1, 0, 0]})
result = np.zeros((len(df),))
current_abs = df.loc[0, "id"]
for i in range(1, len(df)):
if current_abs == df.loc[i, "id"]:
result[i] = result[i - 1]
else:
current_abs = df.loc[i, "id"]
result[i] = result[i - 1] + df.loc[i, "length"]
df["offset_from_start"] = result + df["offset"]
print(df)
id length offset offset_from_start 0 0 0 0 0 1 1 10 0 10 2 2 20 1 31 3 2 20 3 33 4 2 20 4 34 5 3 30 0 60 6 3 30 7 67 7 4 40 0 100 8 5 50 0 150 9 6 60 0 210 10 6 60 1 211 11 7 70 0 280 12 9 90 0 370
This seems like a fancy cumsum
operation, but I don't know how to do it efficiently.