You can run a least squares regression with a mix of Polars and Numpy.
However, as Polars is not a data science library, I think it would make sense to use libraries such as sklearn for it.
Here is an example for running a linear regression using Polars and Numpy:
import polars as pl
import numpy as np
# Create a sample dataset
data = {
'X1': [1, 2, 3, 4, 5],
'X2': [2, 4, 6, 8, 12],
'Y': [2, 4, 5, 4, 5]
}
df = pl.DataFrame(data)
# Separate X and Y
X = df.select(
'X1', 'X2',
ones = pl.lit(1)
)
Y = df['Y']
# Calculate the parameters
X_transpose = X.transpose()
X_transpose_dot_X = np.dot(X_transpose, X)
X_transpose_dot_X_inv = np.linalg.inv(X_transpose_dot_X)
X_transpose_dot_Y = np.dot(X_transpose, Y)
theta = np.dot(X_transpose_dot_X_inv, X_transpose_dot_Y)
df = df.with_columns(
Y_pred = pl.lit(np.dot(X, theta))
)
print(df)
print(f"intercept: {theta[-1]}")
print(f"coef_x1: {theta[0]}")
print(f"coef_x2: {theta[1]}")
┌─────┬─────┬─────┬────────┐
│ X1 ┆ X2 ┆ Y ┆ Y_pred │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪═════╪════════╡
│ 1 ┆ 2 ┆ 2 ┆ 2.7 │
│ 2 ┆ 4 ┆ 4 ┆ 3.4 │
│ 3 ┆ 6 ┆ 5 ┆ 4.1 │
│ 4 ┆ 8 ┆ 4 ┆ 4.8 │
│ 5 ┆ 12 ┆ 5 ┆ 5.0 │
└─────┴─────┴─────┴────────┘
intercept: 1.9999999999999947
coef_x1: 1.2000000000000357
coef_x2: -0.25000000000000533