I am sampling a larger data set to fit and predict with a statsmodels GLM model.
Depending on the sample, running model.predict
will omit some small number (<10) of records in the array that it returns. I assume it experiences some error in processing some small number of the rows in the data set.
For instance, if I predict using rows 15000:20000, the shape of the array returned will be 4994, or 4997, or something similar.
This is a pain because I can't tell which rows are omitted, and I would like to run the .predict function on the entire dataframe and then easily add the prediction values as a new column.
Does someone either (a) know what's going on and how to fix it, or (b) have a good method for adding the prediction values back to the dataframe based on index?