Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
2
votes
3 answers

Can I apply "classification" first and then "regression" to the same data set?

I am a beginner in data science and need help with a topic. I have a data set about the customers of an institution. My goal is to first find out which customers will pay to this institution and then find out how much money the paying customers will…
2
votes
2 answers

for loop regression analysis in R

I have a dataset of fish abundance data on which i want to perform a regression analysis. However, i want to perform a lot of regressions on different subsets of the data ,without having to do this manually, and save the coefs and P value in a new…
Stevestingray
  • 399
  • 2
  • 12
2
votes
1 answer

ModuleNotFoundError for module 'linearmodels'

I want to perform an OLS Panel Regression import pandas as pd import numpy as np import statsmodels.api as sm from linearmodels.datasets import wage_panel from linearmodels.panel import PanelOLS data = wage_panel.load() But I get this…
CSBossmann
  • 193
  • 2
  • 11
2
votes
0 answers

What is the best way to make the data as stationary & inverse transform in time series - Python

I did the 1st differencing as the time series is not stationary. When I do the invert transformation, some values are coming as negative as we get negative values due to diff(). Is there a way to sort it out and bring back the data in original…
Chandra
  • 939
  • 1
  • 6
  • 12
2
votes
1 answer

do scaling data between 0 and 1, and converting their distribution to a normal distribution changes model's RMSLE

I have a question regarding RMSE and RMSLE: to create my model, I first scaled all my feature and target data between 0 and 1 and then converted their distribution to normal distribution using gauss rank scaler. after I fitted a XGBoost model and…
2
votes
0 answers

Php-ml: PHP Machine Learning library, behavior issues

I'm using this library to predict numbers the thing is that I don't understand or don't get the problem in this piece of code, it always returns the same >>> $SVR = new SVR(Kernel::LINEAR); => Phpml\Regression\SVR {#4304} >>> $SVR->train([[1], [2],…
Christian
  • 481
  • 1
  • 7
  • 24
2
votes
2 answers

Difference between exponential fit and log-linear fit

I have data with a clear exponential dependency. I tried to fit a curve through it with two different, very simple models. The first one is a straight forward exponential fit. For the second one, I log transformed the y values and then used a linear…
nhaus
  • 786
  • 3
  • 13
2
votes
1 answer

How to test for spatial non-stationarity in R to determine if local regression model is needed?

I have a dataset for which I implement a regression model and from which I assume that the coefficients vary locally. If a spatial non-stationarity is given, it makes sense to run a local regression model, in my case a Geographically Weighted…
the_chimp
  • 205
  • 4
  • 18
2
votes
2 answers

How to create segmented graphs in ggplot2 with legend?

I have a data as follows: I would like to create a segmented plot (like a pre- and post- plot, including the vertical line at t = 10, to indicate the change. t refers to the elapsed time, x refers to 0 for pre-implementation, 1 for…
HNSKD
  • 1,614
  • 2
  • 14
  • 25
2
votes
0 answers

PRESS Stat Error - Can't use NA as column index with `[` at position 1

I have build the regression model as follows: model = lm(delay ~ industry + public + quality + finished, data) and I am trying to run the PRESS statistic function from the qpcR package > press = PRESS(model, verbose = FALSE) but I am getting the…
Alex
  • 23
  • 1
  • 5
2
votes
1 answer

Spanning header for gtsummary regression table

I am using gtsummary package to tabulate my regression results. With difficulty, I have tried to give my table a spanning header using the following function modify_spanning_header(starts_with("stat_") ~ "**Logistic regression for years in US…
Mohamed Yusuf
  • 390
  • 1
  • 11
2
votes
2 answers

How to include lagged variables in statsmodel ols regression

Is there a way to specify lagged independent variable in statsmodel ols regression? Here's a sample dataframe and ols model specification below. I'd like to include a lagged variable in model. df = pd.DataFrame({ "y":…
kms
  • 1,810
  • 1
  • 41
  • 92
2
votes
1 answer

Logistic regression on single subject data?

I have a data frame with n=18 participants. There are 90 observations across 3 IVs and 1 binary DV (see below for shortened example). data <- data.frame(age = c(21, 30, 25, 41, 29, 33),IQ=c(60,70,80,90,100,200),SAT=(2400,2200,1400,1550,1470,1300),…
Chester
  • 67
  • 6
2
votes
1 answer

How to use scale and shape parameters of gamma GLM in statsmodels

The task I have data that looks like this: I want to fit a generalized linear model (glm) to this from a gamma family using statsmodels. Using this model, for each of my observations I want to calculate the probability of observing a value that is…
Willem
  • 976
  • 9
  • 24
2
votes
1 answer

Fill Pandas Column NaNs with numpy array values

Sorry if this question seems too for newbies but I've been looking for an answer I didn't find it. So, I have a dataset with lots of NaN values and I've been working on some regressions to predict those nulls, and since the prediction is given as a…
fega_zero
  • 125
  • 9
1 2 3
99
100