0

I am trying to write code that will give me the change in the survival function for a 1 unit change in a particular covariate (pay_rate_mat in this case).

I am receiving the error:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 6 is different from 2)

relating to line:

survival_functions = aft_model.predict_survival_function(combined_obs)

I have absolutely no idea what the issue is here as the data object: combined_obs looks perfectly fine to me. See full code below.


import pandas as pd
import numpy as np
from lifelines import WeibullAFTFitter
import matplotlib.pyplot as plt

# Generating a random dataset with 20,000 observations
np.random.seed(42)
n_obs = 20000
data = pd.DataFrame({
    'RELATION_LENGTH': np.random.randint(5, 50, n_obs),
    'CLOSED': np.random.randint(0, 2, n_obs),
    'rank_group': np.random.choice(['A', 'B', 'C'], n_obs),
    'avg_balance': np.random.uniform(1000, 5000, n_obs),
    'avg_rate': np.random.uniform(0.02, 0.08, n_obs),
    'avg_payrate_mat': np.random.randint(1, 10, n_obs),
    'year_joined': np.random.choice(['2005', '2010', '2015'], n_obs)
})

# Convert 'year_joined' to numeric category type
data['year_joined'] = data['year_joined'].astype('category')

# Fit the Weibull AFT model with covariates and interaction term
aft_model = WeibullAFTFitter()
covariates = ['avg_balance', 'avg_rate', 'avg_payrate_mat']
aft_model.fit(data, duration_col='RELATION_LENGTH', event_col='CLOSED', formula=' + '.join(covariates) + ' + rank_group:avg_payrate_mat')

# Find the most frequent values of 'rank_group' and 'year_joined'
most_frequent_rank_group = data['rank_group'].mode().iloc[0]
most_frequent_year_joined = data['year_joined'].mode().iloc[0]

# Create obs1 and obs2 with the same values as mean_values and most frequent values for rank_group and year_joined
mean_values = data[covariates].mean()

obs1_values = np.append(mean_values.values, [most_frequent_rank_group, most_frequent_year_joined])
obs2_values = np.append(mean_values.values, [most_frequent_rank_group, most_frequent_year_joined])

obs1 = pd.DataFrame([obs1_values], columns=covariates + ['rank_group', 'year_joined'])
obs2 = pd.DataFrame([obs2_values], columns=covariates + ['rank_group', 'year_joined'])

# Calculate survival probabilities for both observations while changing avg_payrate_mat
avg_payrate_mat_1 = 1
avg_payrate_mat_2 = 2

obs1['avg_payrate_mat'] = avg_payrate_mat_1
obs2['avg_payrate_mat'] = avg_payrate_mat_2

# Combine both observations into a new DataFrame
combined_obs = pd.concat([obs1, obs2])

# Calculate survival probabilities for the combined observations
survival_functions = aft_model.predict_survival_function(combined_obs)

# Separate survival functions for obs1 and obs2
obs1_survival = survival_functions.iloc[0]
obs2_survival = survival_functions.iloc[1]

# Plot the change in survival curves for the two observations
plt.plot(aft_model.timeline, obs1_survival, label=f'Observation 1 (avg_payrate_mat = {avg_payrate_mat_1})')
plt.plot(aft_model.timeline, obs2_survival, label=f'Observation 2 (avg_payrate_mat = {avg_payrate_mat_2})')
plt.xlabel('Time')
plt.ylabel('Survival Probability')
plt.title('Change in Survival Curve for Two Observations with the Same Covariates')
plt.legend()
plt.show()

McGez
  • 49
  • 3

0 Answers0