I have certain data that needs to be extrapolated for 'Y' vs 'X', corresponding to every 'id/group' till 'Y' drops to zero.
Here, 'X' = cycle and Y = 'Covariate'
However, the last value of 'Covariate' for every 'id' is different.
These are my codes:
## Import the required Python libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
## Create a dataset
data = {'id': [1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4],
'cycle': [1, 2, 3, 4, 5, 6, 7, 8,
1, 2, 3, 4, 5, 6,
1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
1,2,3,4],
'Covariate': [1, 0.99, 0.97, 0.95, 0.9, 0.87, 0.86, 0.81,
1, 0.97, 0.94, 0.93, 0.90, 0.89,
1, 0.99, 0.96, 0.93, 0.89, 0.88, 0.85, 0.83, 0.82, 0.8,
1, 0.96, 0.94, 0.9],
}
## Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
The above data frame for 'Covariate' vs 'cycle', for each 'id', appears as such:
For extrapolation, these are my codes:
## 1st order fitting
Order_fit = 1
## Create an empty array
df_Extrapolated = np.empty(shape=[0, 3])
## Iterate over all ids
for i in range (0,4) :
## id under consideration
id_Number = i+1
## Initialize the fit for the spcific id
fit = np.polyfit(df.groupby(by="id").get_group(id_Number)['cycle'], ## X-axis data
df.groupby(by="id").get_group(id_Number)['Covariate'], ## Y-axis data
Order_fit, ## Order of fit
)
Extrapolate = np.poly1d(fit)
## Get the last cycle for every id
Last_cycle_ith_id = df[['id', 'cycle']].groupby('id').max().reset_index()['cycle'][i]
## Create new X-axis data points for every id
X_axis_data_new_ith_id = np.arange(5) + Last_cycle_ith_id
## Create new Y-axis data points for every id
Y_axis_data_new_ith_id = Extrapolate(X_axis_data_new_ith_id)
## Create an array for ith id
array_ith_id =l = np.array([i+1] * np.shape(Y_axis_data_new_ith_id)[0])
## Store the extrapolated data for ith id in an array
Extrapolated_data_ith_id = np.vstack((array_ith_id,X_axis_data_new_ith_id, Y_axis_data_new_ith_id)).transpose()
## Extrapolated_data for all ids
df_Extrapolated = np.append( df_Extrapolated, Extrapolated_data_ith_id, axis=0)
df_Extrapolated = pd.DataFrame(df_Extrapolated, columns =['id', 'cycle_extrapol', 'Covariate_extrapol'])
print("\n df_Extrapolated = \n",df_Extrapolated)
##Plot the data
## Plot the data
plt.figure(figsize=(10,10))
plt.subplot(221)
plt.plot(df.groupby(by="id").get_group(1)['cycle'], df.groupby(by="id").get_group(1)['Covariate'],'b',label = 'Actual Data')
plt.plot(df_Extrapolated.groupby(by="id").get_group(1)['cycle_extrapol'], df_Extrapolated.groupby(by="id").get_group(1)['Covariate_extrapol'], 'r',label = 'Extrapolated Data')
plt.xlabel('cycle')
plt.ylabel('Covariate')
plt.legend()
plt.title('id 1')
plt.subplot(222)
plt.plot(df.groupby(by="id").get_group(2)['cycle'], df.groupby(by="id").get_group(2)['Covariate'],'b',label = 'Actual Data')
plt.plot(df_Extrapolated.groupby(by="id").get_group(2)['cycle_extrapol'], df_Extrapolated.groupby(by="id").get_group(2)['Covariate_extrapol'], 'r',label = 'Extrapolated Data')
plt.xlabel('cycle')
plt.ylabel('Covariate')
plt.legend()
plt.title('id 2')
plt.subplot(223)
plt.plot(df.groupby(by="id").get_group(3)['cycle'], df.groupby(by="id").get_group(3)['Covariate'],'b',label = 'Actual Data')
plt.plot(df_Extrapolated.groupby(by="id").get_group(3)['cycle_extrapol'], df_Extrapolated.groupby(by="id").get_group(3)['Covariate_extrapol'], 'r',label = 'Extrapolated Data')
plt.xlabel('cycle')
plt.ylabel('Covariate')
plt.legend()
plt.title('id 3')
plt.subplot(224)
plt.plot(df.groupby(by="id").get_group(4)['cycle'], df.groupby(by="id").get_group(4)['Covariate'],'b',label = 'Actual Data')
plt.plot(df_Extrapolated.groupby(by="id").get_group(4)['cycle_extrapol'], df_Extrapolated.groupby(by="id").get_group(4)['Covariate_extrapol'], 'r',label = 'Extrapolated Data')
plt.xlabel('cycle')
plt.ylabel('Covariate')
plt.legend()
plt.title('id 4')
plt.show()
The extrapolated data thus looks as such:
Here, the loophole is, I am generating new X-axis data points (X_axis_data_new_ith_id) and then passing it through the 'Extrapolate' function to get new Y-axis data points (Y_axis_data_new_ith_id) for every id
However, for every id, I need to run the extrapolation till the 'Y' data i.e. 'Covariate' drops down to zero, as such:
Can anyone please let me know how to achieve this task in Python?