Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
8
votes
1 answer

Potential bug in R's `polr` function when run from a function environment?

I may have found some sort of bug polr function (ordinal / polytomous regression) of the MASS library in R. The problem seems to be related to use of coef() on the summary object, but maybe is not. The problem occurs in a function of type: pol_me <-…
tomka
  • 2,516
  • 7
  • 31
  • 45
8
votes
3 answers

Finding non-linear correlations in R

I have about 90 variables stored in data[2-90]. I suspect about 4 of them will have a parabola-like correlation with data[1]. I want to identify which ones have the correlation. Is there an easy and quick way to do this? I have tried building a…
dorien
  • 5,265
  • 10
  • 57
  • 116
8
votes
2 answers

R: How to read Nomograms to predict the desired variable

I am using Rstudio. I have created nomograms using function nomogram from package rms using following code (copied from the example code of the documentation): library(rms) n <- 1000 # define sample size set.seed(17) # so can reproduce the…
dc95
  • 1,319
  • 1
  • 22
  • 44
8
votes
2 answers

Scatter plot kernel smoothing: ksmooth() does not smooth my data at all

Original question I want to smooth my explanatory variable, something like Speed data of a vehicle, and then use this smoothed values. I searched a lot, and find nothing that directly is my answer. I know how to calculate the kernel density…
hajar
  • 103
  • 1
  • 1
  • 5
8
votes
3 answers

Distinction between linear and non linear regression?

In Machine Learning, we say that: w1x1 + w2x2 +...+ wnxn is a linear regression model where w1,w2....wn are the weights and x1,x2...x2 are the features whereas: w1x12 + w2x22 +...+ wnxn2 is a non linear (polynomial) regression model However, in…
8
votes
2 answers

How to do 2SLS IV regression using statsmodels python?

I'm trying to do 2 stage least squares regression in python using the statsmodels library: from statsmodels.sandbox.regression.gmm import IV2SLS resultIV = IV2SLS(dietdummy['Log Income'], dietdummy.drop(['Log…
NANA
  • 123
  • 1
  • 1
  • 9
8
votes
2 answers

How can I omit the regression intercept from my results table in stargazer

I run a regression of the type model <- lm(y~x1+x2+x3, weights = wei, data=data1) and then create my table ,t <- stargazer(model, omit="x2", omit.labels="x1") but I haven't found a way to omit the intercept results from the table. I need it in the…
Felipe Alvarenga
  • 2,572
  • 1
  • 17
  • 36
8
votes
2 answers

is there an R function for Stata's xtnbreg?

Have been using Stata to run negative binomial regressions in a replication. Not sure what is under the hood on how Stata does this, but wanted to know if there is an R function/package that does the same thing? The R will give me a better idea of…
eric
  • 81
  • 2
8
votes
1 answer

Gaussian Process scikit-learn - Exception

I want to use Gaussian Processes to solve a regression task. My data is as follow : each X vector has a length of 37, and each Y vector has a length of 8. I'm using the sklearnpackage in Python but trying to use gaussian processes leads to an…
Julian
  • 556
  • 1
  • 8
  • 27
8
votes
2 answers

How can I train a simple, non-linear regression model with tensor flow?

I've seen this example for linear regression and I would like to train a model where What I've tried #!/usr/bin/env python """Example for learning a regression.""" import tensorflow as tf import numpy # Parameters learning_rate =…
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
8
votes
0 answers

Multicore ggplot2

I am trying to analyze a very large dataset (over 10 million rows; OK, it's big in my field). I'm trying to generate a smoothed regression plot using the following command: ggplot(dataset, aes(x=IV, y=DV)) + geom_smooth(method="loess") This has…
Sean C. Rife
  • 103
  • 7
8
votes
1 answer

R2 values - dplyr and broom

I am using the dplyr and broom combination (per below) and following Fitting several regression models with dplyr to extract the regression coefficients of regressions by group. However - i am also interested in the R2 value of each individual…
user1885116
  • 1,757
  • 4
  • 26
  • 39
8
votes
1 answer

Java 8 change in UTF-8 decoding

We recently migrated our application to JDK 8 from JDK 7. After the change, we ran into a problem with the following snippet of code. String output = new String(byteArray, "UTF-8"); The byte array may contain invalid UTF-8 byte sequences. The same…
Jiraiya
  • 336
  • 2
  • 8
8
votes
3 answers

Why is logistic regression called regression?

According to what I have understood, linear regression predicts the outcome which can have continuous values, whereas logistic regression predicts outcome which is discrete. It seems to me that logistic regression is similar to a classification…
8
votes
3 answers

Observation deleted due to missingness in R

I am busy with a regression model in R and I have about 16 000 observations. One of these observations causes me to get the following error message: # (1 observation deleted due to missingness) Is there a way in R to identify this one observation?
Jason Samuels
  • 951
  • 6
  • 22
  • 40