1

I am trying to port this functionality into python

> x <- 0:10
> y <- x**2
> lm(y ~ ns(x,df=2))

Such as:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

x = pd.DataFrame(np.arange(11))
y = x**2
formula="y ~ cr(x, df = 3)"

reg = smf.ols(formula,data=x).fit()
print(res.summary())

However with this python formulation, I cannot set df<3. Any suggestions how I can have a natural spline in python with two degrees of freedom, and use it in patsy as an R style equation?

  • I would inspect (probably graphically) the basis matrices generated by these two approaches. Are they both *natural* splines (i.e. linear constraints at the boundaries)? Do they both include or exclude the intercept term? – Ben Bolker Mar 08 '23 at 01:53
  • Is this a duplicate? https://stackoverflow.com/questions/71550468/does-python-have-an-analogue-to-rs-splinesns – Ben Bolker Mar 08 '23 at 02:09

1 Answers1

0

These are clearly generating different bases: I'm not sure what the difference is, but the exploration below might help.

Note that cr mimics the basis construction from mgcv (see here; in addition to Simon Wood's book they are also discussed here), while ns() is a natural spline built on a B-spline basis. I believe that splines::bs() and patsy.bs would match perfectly, but there is no patsy.ns.

x <- 0:10
X1 <- model.matrix(~splines::ns(x, df = 3))
matplot(x, X1, type = "l")

enter image description here

import numpy as np
import pandas as pd
import patsy
import matplotlib.pyplot as plt
x = np.arange(11)
X2 = patsy.dmatrix(
        'cr(x, df = 3)',
        {'x': x}, return_type='dataframe')
plt.plot(X2)

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453