Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
27
votes
3 answers

GridSearch over MultiOutputRegressor?

Let's consider a multivariate regression problem (2 response variables: Latitude and Longitude). Currently, a few machine learning model implementations like Support Vector Regression sklearn.svm.SVR do not currently provide naive support of…
Sharan N
  • 628
  • 1
  • 7
  • 12
27
votes
2 answers

How to increase the model accuracy of logistic regression in Scikit python?

I am trying to predict the admit variable with predictors such as gre,gpa and ranks. But the prediction accuracy is very low (0.66).The dataset is given below. https://gist.github.com/abyalias/3de80ab7fb93dcecc565cee21bd9501a The first few rows of…
27
votes
2 answers

Local linear regression in R -- locfit() vs locpoly()

I am trying to understand the different behaviors of these two smoothing functions when given apparently equivalent inputs. My understanding was that locpoly just takes a fixed bandwidth argument, while locfit can also include a varying part in its…
user1870614
  • 404
  • 1
  • 4
  • 5
27
votes
3 answers

Multivariate polynomial regression with numpy

I have many samples (y_i, (a_i, b_i, c_i)) where y is presumed to vary as a polynomial in a,b,c up to a certain degree. For example for a given set of data and degree 2 I might produce the model y = a^2 + 2ab - 3cb + c^2 +.5ac This can be done…
MRocklin
  • 55,641
  • 23
  • 163
  • 235
26
votes
2 answers

Python threading error - must be an iterable, not int

I'm trying to calculate rolling r-squared of regression among first column and other columns in a dataframe (first column and second, first column and third etc.) But when I try threading, it kept telling me the error that TypeError:…
26
votes
2 answers

How to correctly use scikit-learn's Gaussian Process for a 2D-inputs, 1D-output regression?

Prior to posting I did a lot of searches and found this question which might be exactly my problem. However, I tried what is proposed in the answer but unfortunately this did not fix it, and I couldn't add a comment to request further explanation,…
Julie
  • 263
  • 1
  • 3
  • 6
25
votes
4 answers

Non-linear regression in C#

I'm looking for a way to produce a non-linear (preferably quadratic) curve, based on a 2D data set, for predictive purposes. Right now I'm using my own implementation of ordinary least squares (OLS) to produce a linear trend, but my trends are much…
Polynomial
  • 27,674
  • 12
  • 80
  • 107
25
votes
2 answers

Understanding Tensorflow LSTM Input shape

I have a dataset X which consists N = 4000 samples, each sample consists of d = 2 features (continuous values) spanning back t = 10 time steps. I also have the corresponding 'labels' of each sample which are also continuous values, at time step 11.…
Renier Botha
  • 830
  • 1
  • 10
  • 19
25
votes
2 answers

Can I draw a regression line and show parameters using scatterplot with a pandas dataframe?

I would like to produce a Scatterplot from a Pandas dataframe using the following code: df.plot.scatter(x='one', y='two, title='Scatterplot') Is there a Parameter I can send with the Statement, so it plots a Regression line and shows the…
Markus W
  • 1,451
  • 5
  • 19
  • 32
25
votes
3 answers

python stats models - quadratic term in regression

I have the following linear regression: import statsmodels.formula.api as sm model = sm.ols(formula = 'a ~ b + c', data = data).fit() I want to add a quadratic term for b in this model. Is there a simple way to do this with statsmodels.ols? Is…
datavoredan
  • 3,536
  • 9
  • 32
  • 48
24
votes
7 answers

Nonlinear regression with python - what's a simple method to fit this data better?

I have some data that I want to fit so I can make some estimations for the value of a physical parameter given a certain temperature. I used numpy.polyfit for a quadratic model, but the fit isn't quite as nice as I'd like it to be and I don't have…
Jinx
  • 511
  • 1
  • 3
  • 10
24
votes
2 answers

Multiple-output Gaussian Process regression in scikit-learn

I am using scikit learn for Gaussian process regression (GPR) operation to predict data. My training data are as follows: x_train = np.array([[0,0],[2,2],[3,3]]) #2-D cartesian coordinate points y_train = np.array([[200,250,…
24
votes
1 answer

How to use lightgbm.cv for regression?

I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds. The following approach works without a problem with XGBoost's xgboost.cv. I prefer not to use Scikit Learn's approach with GridSearchCV, because it…
Marius
  • 409
  • 1
  • 5
  • 9
24
votes
5 answers

How to obtain RMSE out of lm result?

I know there is a small difference between $sigma and the concept of root mean squared error. So, i am wondering what is the easiest way to obtain RMSE out of lm function in R? res<-lm(randomData$price ~randomData$carat+ …
Jeff
  • 7,767
  • 28
  • 85
  • 138
24
votes
4 answers

Using Keras ImageDataGenerator in a regression model

I want to use the flow_from_directory method of the ImageDataGenerator to generate training data for a regression model, where the target value can be any float value between 1 and -1. flow_from_directory has a "class_mode" parameter with the…
Oblomov
  • 8,953
  • 22
  • 60
  • 106