0

I am running a predict.lm on a test set of data. Not every oberservation is getting an outputted result. When I run my predict code I get the following results

predict(lm(Q2 ~ Q1 + Q3A + Q3B + Q3C + Q3D + Q3E + Q3H + Q5_2C + Q5_2E + Q9A + Q9B + Q9E + Q9F, data=workingtest))

1     2     3     5     6     7     8     9    10    11    12    13    14    15    16    17 
9.81  9.96 10.11  7.40  6.82  8.00 10.29  8.42  7.23  1.92  7.87  9.23  9.22  9.86  9.10  6.29 
18    19    20    21    22    23    24    26    27    28    29    30    31    32    33    34 
5.92 10.08  9.96  7.79  7.70  9.96 10.17 10.08  6.62  9.20  8.07  8.85  9.96  9.81  6.16  9.20 
35    36    37    38    39    40    41    42    43    44    45    46    47    48    49    50 
9.41  4.18  8.49  7.64  9.04  8.91  7.93  4.00  9.68  8.00  8.75  9.51  1.92  9.96  9.09  8.29 
51    52    53    54    56    57    58    59    60    61    62    63    64    65    66    67 
7.99  6.58  9.10  7.79  5.15  7.75  4.44 10.02  5.58  8.55 10.08  6.40  7.12 10.09  7.55 10.14 
68    69    70    71    72    73    74    75    76    77    79    80    81    82    83    84 
8.34 10.08  7.60 10.08 10.08  6.66  9.90  6.74  9.96  7.52  6.46  9.29 10.08  8.57  9.95  7.75 
85    86    88    89    90    91    92    93    94    95    96    97    98    99   100   101 
10.04 10.04  8.17  8.49  7.70  8.94  9.93  8.85  7.89  9.49  9.44  9.96  7.66  6.77  9.76  8.99 
102   103   104   105   106   107   108   109   110   111   112   113   114   115   116   117 
7.90  8.98  9.96 10.14 10.19  7.32  9.31  7.97  2.55  7.36  6.95  9.96  7.26  6.61 10.01  4.44 
118   119   120   121   122   123   124   125   126   127   128   129   130   131   132   133 
9.72  8.01  9.78  8.41  8.11  9.57  8.74  9.58  6.64  9.96 10.01  8.73  7.39  7.00  8.91  6.96 

I want to create a data frame with the row name and result in the data frame.
However, when I create the data frame, row.names is not a variable. I need this variable to match up with the original data set to match up results.

Here is the code I use to make the data frame. When I look it shows only 1 variable, but when I bring it up, both the variables are listed. I can't figure it out!

Predicting <- data.frame(predict(lm(Q2 ~ Q1 + Q3A + Q3B + Q3C + Q3D + Q3E + Q3H + Q5_2C + Q5_2E + Q9A + Q9B + Q9E + Q9F, data=workingtest)))
  • Do this search in SO: [r] how to make rownames a variable – IRTFM Aug 25 '14 at 17:14
  • I'll point out that the question you asked isn't really the question you want answered (assuming my answer is on the right track). You'll get the best answers here at StackOverflow if your question is about what you're actually trying to do, and then you continue by telling us what you've tried and what you're having trouble with. – Aaron left Stack Overflow Aug 25 '14 at 17:19
  • The other part of you question besides making hte row.names a variable is answered in the `?predict.lm` page: `If na.action = na.omit omitted cases will not appear in the predictions, whereas if na.action = na.exclude they will appear (in predictions, standard errors or interval limits), with value NA. See also napredict.` – IRTFM Aug 25 '14 at 17:37

1 Answers1

0

Use the newdata parameter to predict, which by default will predict NA when there is missing data. Without this, it only uses data that was used in the original fit.

Your example isn't reproducible so I'll make one up.

> d <- data.frame(x=c(1:3,NA,5:6), y=1:6+rnorm(6))
> m <- lm(y~x,data=d)
> predict(m)
       1        2        3        5        6 
0.138060 1.154337 2.170614 4.203168 5.219445 
> predict(m, newdata=d)
       1        2        3        4        5        6 
0.138060 1.154337 2.170614       NA 4.203168 5.219445 

Alternatively, you can specify na.exclude in your original fit.

> m <- lm(y~x,data=d, na.action=na.exclude)
> predict(m)
       1        2        3        4        5        6 
2.023771 2.872983 3.722195       NA 5.420618 6.269830 
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • See if you continue to believe this after reading my comment above. – IRTFM Aug 25 '14 at 17:38
  • Hi @BondedDust. The default is `na.pass`, which does as I stated here. The options you specify in your comment control what how missing values are dealt with in the original fit, and would be an alternate method (edit forthcoming). – Aaron left Stack Overflow Aug 25 '14 at 17:44
  • Edit would be welcome and I'll delete these comments. – IRTFM Aug 25 '14 at 17:46