1

I was wondering how to get the actual components from predict(..., type = 'term). I know that if I take the rowSums and add the attr(,"constant") value to each, I will get the predicted values but what I'm not sure about is how this attr(,"constant") is split up between the columns. All in all, how do I alter the matrix returned by predict so that each value represents the model coefficient multiplied by the prediction data. The result should be a matrix (or data.frame) with the same dimensions as returned by predict but the rowSums automatically add up to the predicted values with no further alteration needed.

Note: I realize I could probably take the coefficients produced by the model and matrix multiply them with my prediction matrix but I'd rather not do it that way to avoid any problems that factors could produce.

Edit: The goal of this question is not to produce a way of summing the rows to get the predicted values, that was just meant as a sanity check.

If I have the equation y = 2*a + 3*b + c and my predicted value is 500, I want to know what 2*a was, what 3*b was, and what c was at that particular point. Right now I feel like these values are being returned by predict but they've been scaled. I need to know how to un-scale them.

stat_student
  • 787
  • 10
  • 17
  • The only thing that changes when you scale the predicted value is the intercept... the coefficients don't change. Ive already said this in the answer – Rorschach Sep 04 '15 at 18:03
  • Then I don't understand why the values don't match. When you predict with a linear model you just take the coefficient times the value but this is not the result returned. – stat_student Sep 04 '15 at 18:10
  • did fitting the model with no intercept and then predicting terms not give what you were looking for? – Rorschach Sep 04 '15 at 18:18
  • Oh i see that did give me what I want. Is there a way to convert the original matrix into this new one without refitting the model? My code is fitting a bunch of models and I'd rather not have to refit them all without an intercept. – stat_student Sep 04 '15 at 18:22
  • I don't think so, because they are different models. You may want to look at this answer http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-lm. – Rorschach Sep 04 '15 at 18:53
  • Ok, I realize now that this isn't what I want. In your example they are almost identical but when I run this code on my models, the results are different because of what you said about them being different models. – stat_student Sep 04 '15 at 20:08
  • My answer got flagged as a duplicate. This should answer your question https://stackoverflow.com/questions/47853831/individual-terms-in-prediction-of-linear-regression/72409317#72409317 – William Chiu May 29 '22 at 02:57

2 Answers2

1

It's not split up between the columns - it corresponds to the intercept. If you include an intercept in the model, then it is the mean of the predictions. For example,

## With intercept
fit <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris)
tt <- predict(fit, type="terms")
pp <- predict(fit)
attr(tt, "constant")
# [1] 5.843333
attr(scale(pp, scale=F), "scaled:center")
# [1] 5.843333
## or
mean(pp)
# [1] 5.843333

If you make the model without an intercept, there won't be a constant, so you will have a matrix where the rowSums correspond to the predictions.

## Without intercept
fit1 <- lm(Sepal.Length ~ Sepal.Width + Species - 1, data=iris)
tt1 <- predict(fit1, type="terms")
attr(tt1, "constant")
# [1] 0

all.equal(rowSums(tt1), predict(fit1))
## [1] TRUE

By scaling (subtracting the mean) of the predicted variable, only the intercept is changed, so when there is no intercept no scaling is done.

fit2 <- lm(scale(Sepal.Length, scale=F) ~ Sepal.Width + Species, data=iris)
all.equal(coef(fit2)[-1], coef(fit)[-1])
## [1] TRUE
Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • Thanks, but I realized all this. What I'm after can be explained with your example. The first `Sepal.Width` value in `iris` has a value of 3.5. In your model the coefficient for `Sepal.Width` is .8036. Therefore I want to alter the first value of the `Sepal.Width` column in tt so that it is equal to .8036 * 3.5 = 2.8126. This value is currently .3557 and my question was meant to ask how .3557 and 2.8126 are related and how I can calculate one given the other one. – stat_student Sep 04 '15 at 17:40
0

As far as I know, the constant is set as an attribute to save memory, if you want rowSums to calculate the correct predicted values then you either need to create the extra column containing constant or just add constant to the output of rowSums. (see the unnecessarily verbose example below)

rowSums_lm <- function(A){
   if(!is.matrix(A) || is.null(attr(A, "constant"))){
          stop("Input must be a matrix with a 'constant' attribute")
   }
   rowSums(A) + attr(A, "constant")
}
frank2165
  • 114
  • 2
  • 7
  • Thanks, but as I mentioned in the second sentence of my question I already knew about this. I want to know how to make it so that each value in the matrix corresponds to the model coefficient times the prediction value. – stat_student Sep 04 '15 at 17:46