Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on regression should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions

votes

5 answers

Python natural smoothing splines

I am trying to find a python package that would give an option to fit natural smoothing splines with user selectable smoothing factor. Is there an implementation for that? If not, how would you use what is available to implement it yourself? By…

asked Jul 13 '18 at 08:41

Niko Föhr

28,336
10
93
96

votes

3 answers

How to calculate the regularization parameter in linear regression

When we have a high degree linear polynomial that is used to fit a set of points in a linear regression setup, to prevent overfitting, we use regularization, and we include a lambda parameter in the cost function. This lambda is then used to update…

machine-learning data-mining regression

asked Aug 29 '12 at 16:04

London guy

27,522
44
121
179

votes

3 answers

Linear Regression with a known fixed intercept in R

I want to calculate a linear regression using the lm() function in R. Additionally I want to get the slope of a regression, where I explicitly give the intercept to lm(). I found an example on the internet and I tried to read the R-help "?lm"…

r regression linear-regression lm

asked Sep 07 '11 at 11:38

R_User

10,682
25
79
120

votes

2 answers

What does the capital letter "I" in R linear regression formula mean?

I haven't been able to find an answer to this question, largely because googling anything with a standalone letter (like "I") causes issues. What does the "I" do in a model like this? data(rock) lm(area~I(peri - mean(peri)), data =…

r regression formula polynomials

asked Jun 12 '14 at 19:26

Nancy

3,989
5
31
49

votes

5 answers

setting values for ntree and mtry for random forest regression model

I'm using R package randomForest to do a regression on some biological data. My training data size is 38772 X 201. I just wondered---what would be a good value for the number of trees ntree and the number of variable per level mtry? Is there an…

r statistics machine-learning regression random-forest

asked Dec 19 '12 at 16:09

DOSMarter

1,485
5
21
29

votes

4 answers

how to use the Box-Cox power transformation in R

I need to transform some data into a 'normal shape' and I read that Box-Cox can identify the exponent to use to transform the data. For what I understood car::boxCoxVariable(y) is used for response variables in linear models,…

r regression transformation

asked Nov 30 '15 at 13:14

dede

1,129
5
15
35

votes

4 answers

What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least squares regression?

Could someone explain to the statistically naive what the difference between Multiple R-squared and Adjusted R-squared is? I am doing a single-variate regression analysis as follows: v.lm <- lm(epm ~ n_days, data=v) …

r statistics regression

asked May 20 '10 at 02:17

fmark

57,259
27
100
107

votes

7 answers

predict.lm() with an unknown factor level in test data

I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model, all of predict.lm() fails and returns an error. Is there a good way to have predict.lm() return a…

r regression linear-regression lm

asked Nov 26 '10 at 12:15

Stephan Kolassa

7,953
2
28
48

votes

3 answers

GridSearchCV - XGBoost - Early Stopping

i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task.…

python-3.x scikit-learn regression data-science xgboost

asked Mar 24 '17 at 07:15

ayyayyekokojambo

1,165
3
13
33

votes

3 answers

Difference between cross_val_score and cross_val_predict

I want to evaluate a regression model build with scikitlearn using cross-validation and getting confused, which of the two functions cross_val_score and cross_val_predict I should use. One option would be : cvs = DecisionTreeRegressor(max_depth =…

python machine-learning scikit-learn regression cross-validation

asked Apr 25 '17 at 14:25

Bobipuegi

votes

4 answers

What is the difference between xgb.train and xgb.XGBRegressor (or xgb.XGBClassifier)?

I already know "xgboost.XGBRegressor is a Scikit-Learn Wrapper interface for XGBoost." But do they have any other difference?

python machine-learning scikit-learn regression xgboost

asked Nov 07 '17 at 07:54

Statham

4,000
2
32
45

votes

4 answers

Show confidence limits and prediction limits in scatter plot

I have two arrays of data for height and weight: import numpy as np, matplotlib.pyplot as plt heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65]) weights =…

numpy matplotlib scipy regression seaborn

asked Nov 27 '14 at 06:07

Eric Bal

1,115
3
12
16

votes

3 answers

Scikit-learn cross validation scoring for regression

How can one use cross_val_score for regression? The default scoring seems to be accuracy, which is not very meaningful for regression. Supposedly I would like to use mean squared error, is it possible to specify that in cross_val_score? Tried the…

python scikit-learn regression

asked Jun 10 '14 at 03:08

clwen

20,004
31
77
94

votes

12 answers

ValueError: feature_names mismatch: in xgboost in the predict() function

I have trained an XGBoostRegressor model. When I have to use this trained model for predicting for a new input, the predict() function throws a feature_names mismatch error, although the input feature vector has the same structure as the training…

python pandas machine-learning regression xgboost

asked Feb 20 '17 at 07:43

Sujay S Kumar

votes

7 answers

sklearn LogisticRegression and changing the default threshold for classification

I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the…

python machine-learning scikit-learn regression classification

asked Jul 14 '15 at 21:12

Chetan Prabhu

Prev 1

…

99 100 Next