Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
2
votes
1 answer

Plotly: How to display a regression line for one variable against multiple other time series?

With a dataset such as time series for various stocks, how can you easily display a regression line for one variable against all others and quickly define a few aesthetic elements such as: which variable to plot against the others, theme color for…
vestland
  • 55,229
  • 37
  • 187
  • 305
2
votes
1 answer

Graphs of the mixed effects model residuals using the ggplot2 function

I am trying to graph the residual effects of the mixed effects model using the ggplot2 function. However, after performing a search I found some functions available but what seems to me is that for the function nlme they are not working. The graphs…
user55546
  • 37
  • 1
  • 15
2
votes
1 answer

Panel regression - Estimators

I am trying to do a panel regression in R. pdata <- pdata.frame(NEW, index = c("Year")) And: R1 <- plm(Market_Cap ~ GDP_growthR + Volatility_IR + FDI + Savings_rate, data=pdata, model="between") However when I want to use the within (or…
S_Star
  • 53
  • 5
2
votes
1 answer

Adjusted regression line considering different factors in ggplot2

I'm trying to reproduce the graph below, where the internal lines are the adjusted regression lines: However, due to some factor, it is not being plotted what it should be, that is, a single line is being presented, and more, the different…
user55546
  • 37
  • 1
  • 15
2
votes
0 answers

Fit a no intercept binary model in caret

Let's take data : y <- sample(0:1, 125, T) x <- data.frame(rnorm(125), rexp(125)) I want to perform cross validation on data above, without intercept. To exclude intercept in linear models in caret we just need to use : tuneGrid =…
John
  • 1,849
  • 2
  • 13
  • 23
2
votes
2 answers

Using mob() trees (partykit package) with logistic() model

I am trying to use model-based recursive partitioning (MOB) with the mob() function (from the partykit package) to to obtain the different parameters associated to each feature depending on the optimal partition found using the logistic() regression…
vog
  • 770
  • 5
  • 11
2
votes
1 answer

Prevent empty data to enter an lm() call in R?

I'm trying to come up with a mechanism to prevent empty results in my lm() output. To be exact, I want to first find them and then prevent them from being entered into a new lm() call. For example, in the example below, cf.type99:time3 &…
rnorouzian
  • 7,397
  • 5
  • 27
  • 72
2
votes
2 answers

Preprocess Value in Degrees for Regression to Avoid Discontinuity

I have a set of image data and I'm trying to train a neural network to predict a value in degrees as an output inside a range of [-180. 180). I don't like the idea of the large discontinuity between -180 and 180 (or equivalently 0 and 360) for…
Alex Wulff
  • 2,039
  • 3
  • 18
  • 29
2
votes
1 answer

Coverage probability calculation for LM

I am trying to calculate coverage probability for a set of residual bootstrap replicates I generated on the intercept and slope of regression . Can anyone show me how to calculate coverage probability of confidence intervals? Many thanks. Note that…
cliu
  • 933
  • 6
  • 13
2
votes
1 answer

How to apply gaussian process regression on series problems?

I have been working on a problem as follows, that I wish to perform regression on using Gaussian Process Regressor (GPR): Input (X): [list1, list2, list3, ....] # All the lists (or arrays) may not be of the same size Output(y): [value1, value2,…
2
votes
1 answer

Linear Regression - Get Feature Importance using MinMaxScaler() - Extremely large coefficients

I'm trying to get the feature importances for a Regression model. I have 58 independent variables and one dependent variables. Most of the independent variables are numerical and some are binary. First I used this: X = dataset.drop(['y'], axis=1) y…
2
votes
1 answer

Rolling stepwise regression with dplyr

I want to make an rolling stepwise regression with dplyr, do() and rollapply(). My code for the data looks like this: FUND_DATA <- tibble( DATE = 1:10, FUND1 = rnorm(10), FUND2 = rnorm(10), FUND3 = rnorm(10), FUND4 = rnorm(10)) These…
MeT
  • 21
  • 3
2
votes
1 answer

Tidymodels(Fitting a random forest with fit_samples()): Fold01: internal: Error: Must group by variables found in `.data`

Overview I have produced a random forest regression model, and, my aim is to fit the model using the function fit_samples() function, and then tune the hyperparameters. However, I am experiencing this error message below: Error Message: ! Fold01:…
Alice Hobbs
  • 1,021
  • 1
  • 15
  • 31
2
votes
1 answer

Adding custom column to regression table (tab_model, sjplot)?

I'd like to add my own column containing VIF values to a regression table that I've made with the tab_model() function in the sjplot package. Here's an example of what I'm trying to do: log_fit <- glm(Sepal.Length ~ ., data =…
pfadenhw
  • 119
  • 6
2
votes
0 answers

Opimization of CNN for Regression Keras Tuner

I am using Keras Tuner to optimize a CNN model for a regression problem. Basicly I have sequences of DNA that I turned into a matrix in order to use them as images to train a CNN model. What I want to predict is a percentage that depends on those…