Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
9
votes
9 answers

Best approach to what I think is a machine learning problem

I am wanting some expert guidance here on what the best approach is for me to solve a problem. I have investigated some machine learning, neural networks, and stuff like that. I've investigated weka, some sort of baesian solution.. R.. several…
9
votes
2 answers

XGBoost Best Iteration

I am running a regression using the XGBoost Algorithm as, clf = XGBRegressor(eval_set = [(X_train, y_train), (X_val, y_val)], early_stopping_rounds = 10, n_estimators = 10, …
Alessandro Ceccarelli
  • 1,775
  • 5
  • 21
  • 41
9
votes
2 answers

How to restrict output of a neural net to a specific range?

I'm using Keras for a regression task and want to restrict my output to a range (say between 1 and 10) Is there a way to ensure this?
megan adams
  • 355
  • 1
  • 6
  • 10
9
votes
1 answer

Do dynlm and dlm have same mathematical expressions?

I am currently using dynamic linear regression (dynlm) for my analysis. However, I do also find another model called dynamic linear model (dlm). I find that dlm has an official mathematical expression by West and Harrison (1989) and everywhere.…
Eric
  • 528
  • 1
  • 8
  • 26
9
votes
1 answer

What standard errors are returned with predict.glm(..., type = "response", se.fit = TRUE)?

I am going to fit the model on the data provided in this excellent example on how to compute the 95% confidence interval for the response, after performing a logistic regression: foo <- mtcars[,c("mpg","vs")]; names(foo) <- c("x","y") mod <- glm(y ~…
Alex
  • 15,186
  • 15
  • 73
  • 127
9
votes
2 answers

R - Plm and lm - Fixed effects

I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation…
Jasper
  • 133
  • 1
  • 2
  • 8
9
votes
1 answer

How does R handle ordinal predictors in lm()?

As I understand it, when you fit a linear model in R using a nominal predictor, R essentially uses dummy 1/0 variables for each level (except the reference level), and then giving a regular old coefficient for each of these variables. What does it…
MissMonicaE
  • 709
  • 1
  • 8
  • 15
9
votes
5 answers

Solve best fit polynomial and plot drop-down lines

I'm using R 3.3.1 (64-bit) on Windows 10. I have an x-y dataset that I've fit with a 2nd order polynomial. I'd like to solve that best-fit polynomial for x at y=4, and plot drop-down lines from y=4 to the x-axis. This will generate the data in a…
jeffgoblue
  • 319
  • 1
  • 3
  • 11
9
votes
1 answer

Scikit-Learn SVR Prediction Always Gives the Same Value

I'm about to predict IMDB score (film rate) using Support Vector Regression in Scikit-Learn. The problem is it always gives the same prediction result for every input. When i predict using data training, it gives various result. But when using data…
9
votes
1 answer

How to compute standard error from ODR results?

I use scipy.odr in order to make a fit with uncertainties on both x and y following this question Correct fitting with scipy curve_fit including errors in x? After the fit I would like to compute the uncertainties on the parameters. Thus I look at…
Ger
  • 9,076
  • 10
  • 37
  • 48
9
votes
1 answer

Why is bam from mgcv slow for some data?

I am fitting the same Generalized Additive Model on multiple data sets using the bam function from mgcv. While for most of my data sets the fit completes within a reasonable time between 10 and 20 minutes. For a few data sets the run take more than…
unique2
  • 2,162
  • 2
  • 18
  • 23
9
votes
2 answers

Polynomial regression in spark/ or external packages for spark

After investing good amount of searching on net for this topic, I am ending up here if I can get some pointer . please read further After analyzing Spark 2.0 I concluded polynomial regression is not possible with spark (spark alone), so is there…
sourabh
  • 223
  • 2
  • 13
9
votes
1 answer

ggplot2: add regression equations and R2 and adjust their positions on plot

Using df and the code below library(dplyr) library(ggplot2) library(devtools) df <- diamonds %>% dplyr::filter(cut%in%c("Fair","Ideal")) %>% dplyr::filter(clarity%in%c("I1" , "SI2" , "SI1" , "VS2" , "VS1", "VVS2")) %>% …
shiny
  • 3,380
  • 9
  • 42
  • 79
9
votes
2 answers

Naming explanatory variables in regression output

Each one of my variables is a list on its own. I am using a method found on another thread here. import numpy as np import statsmodels.api as sm y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4] x = [ …
9
votes
1 answer

Compute a kernel ridge regression in R for model selection

I have a dataframe df df<-structure(list(P = c(794.102395099402, 1299.01021921817, 1219.80731174175, 1403.00786976395, 742.749487463385, 340.246973543409, 90.3220586792255, 195.85557320714, 199.390867672674, 191.4970921278, 334.452413539092,…
SimonB
  • 670
  • 1
  • 10
  • 25