I'm trying to develop an XGBoost Survival model. Here is a quick snap of my code:
X = df_High_School[['Gender', 'Lived_both_Parents', 'Moth_Born_in_Canada', 'Father_Born_in_Canada','Born_in_Canada','Aboriginal','Visible_Minority']] # covariates
y = df_High_School[['time_to_event', 'event']] # time to event and event indicator
#split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
#Develop the model
model = xgb.XGBRegressor(objective='survival:cox')
It's giving me the following error:
ValueError Traceback (most recent call last) in 18 19 # fit the model to the training data ---> 20 model.fit(X_train, y_train) 21 22 # make predictions on the test set
2 frames /usr/local/lib/python3.8/dist-packages/xgboost/core.py in _maybe_pandas_label(label) 261 if isinstance(label, DataFrame): 262 if len(label.columns) > 1: --> 263 raise ValueError('DataFrame for label cannot have multiple columns') 264 265 label_dtypes = label.dtypes
ValueError: DataFrame for label cannot have multiple columns
As this is a survival model, I need two columns t indicate the event and the time_to_event. I also tried converting the Dataframes to Numpy but it didn't work too.
Any clue? Thanks!