The difference between random states in sklearn

Question

I am writing this to understand more about regressions in machine learning. When I set 42 for random_state, lasso seems to predict badly. But it is vice versa when I set 2. Is there any way to choose the random_state?

It is just a simple and very basic program. Here is my code:

import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf

data = yf.download('NG=F', '2010-06-06', '2023-06-06', auto_adjust=True) # gold

print(data.head(20))
print(data.shape)
print(data.info())

data.Close.plot(figsize=(10,7))
plt.show()

import seaborn as sbn

from sklearn.model_selection import train_test_split

x = data.drop('Close', axis=1)
y = data['Close']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
from sklearn.linear_model import LinearRegression

line_near = LinearRegression()
line_near.fit(x_train, y_train)
predictions = line_near.predict(x_test)
print(f'Actual values: {y_test[0:10]}')
print(f'Predictions: {predictions[0:10]}')
from sklearn.metrics import mean_squared_error

mean_error = mean_squared_error(y_test, predictions)
print(mean_error)
from sklearn.linear_model import Ridge

ridge = Ridge()

ridge.fit(x_train, y_train)
ridge_predictions = ridge.predict(x_test)
print(f'Actual: {ridge_predictions[0:10]}')
print(f'Predictions: {y_test[0:10]}')
mean_error_ridge = mean_squared_error(y_test, ridge_predictions)
print(mean_error_ridge)
from sklearn.linear_model import Lasso

lasso = Lasso()
lasso.fit(x_train, y_train)
lasso_predictions = lasso.predict(x_test)
print(f'Actual: {y_test[0:10]}')
print(f'predictions: {lasso_predictions[0:10]}')
mean_error_lasso = mean_squared_error(y_test, lasso_predictions)
print(mean_error_lasso)

I also note that in `data`, there are a couple of days when `Volume` is `0`. If these days are removed from the training set, does the performance meet your expectations? — rickhg12hs, Jul 02 '23 at 14:52
x = data.drop('Close', axis=1) y = data['Close'] I forgot to put these lines on my question. — nnguyenquy, Jul 03 '23 at 15:36
I believe Lasso's regularization term is reducing the number of features used in the final prediction. Have a look at `lasso.coef_`. This reduced feature model just isn't as good as the other models you tried. You also might want to _play_ with Lasso's `alpha` parameter to see the effects. — rickhg12hs, Jul 04 '23 at 16:38

score 0 · Answer 1 · answered Jul 03 '23 at 09:42

0

When the data set is small, the performance depends highly on the split. You should do a cross validation run and see how much the performance varies with split.

answered Jul 03 '23 at 09:42

Kilian

468
2
9

The difference between random states in sklearn

1 Answers1