I'm building a linear regression model where one of the input variables is number of sales. Rather than using the number of sales per day as a linear input, I want to use some form of cubic spline transformation (because it tends to tail off after a set point, and the relationship before this isn't linear). The question I have is:
I believe I can create cubic splines for this variable like so on my training dataset (and then build a linear model using these) like so:
transformed_x = dmatrix("bs(data, knots=(2000, 3000, 4000), degree=3, include_intercept=False)", {"data": df['Sales_Volume']},return_type='dataframe')
But for making predictions for a single new data point, say for 5000 sales, how can I use these same splines to make a prediction on my fitted model?
If I try to just create another transformed version of transformed_x for the single data point of 5000 sales I get an error saying:
ValueError: some knot values ([2000 3000 4000]) fall below lower bound (5000)
It works if I have a large new dataset to predict that covers the range of all of those knots, but now I'm not sure if I can be confident that making the same transformation on a new dataset will yield correct results?