2

I have a general question. is there anyway that I can identify (or tag) the observations used in a regression in R?

lligator = data.frame(lnLength = c(3.87, 3.61, NA, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,NA, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25))

t.test=lm(lnWeight ~ lnLength, data = alligator)

I want to create a data frame with another column indicating which observation is used. I know how

na.omit() 

and

na.exclude()

and

.completecases

work and I can use them to do the regression. but what I am looking for is a way to create an indicator to show which observation is used. for those of Stata users something similar to e(sample)

Batanichek
  • 7,761
  • 31
  • 49
Yashar
  • 23
  • 3

1 Answers1

4

If I understand correctly, you can use na.action() to retrieve the vector of indexes that were excluded during regression and use that to compute an indicator variable:

alligator$used <- !seq_len(nrow(alligator))%in%na.action(t.test);
alligator;
##    lnLength lnWeight  used
## 1      3.87     4.87  TRUE
## 2      3.61     3.93  TRUE
## 3        NA     6.46 FALSE
## 4      3.43     3.33  TRUE
## 5      3.81     4.38  TRUE
## 6      3.83     4.70  TRUE
## 7      3.46     3.50  TRUE
## 8      3.76     4.50  TRUE
## 9      3.50       NA FALSE
## 10     3.58     3.64  TRUE
## 11     4.19     5.90  TRUE
## 12     3.78     4.43  TRUE
## 13     3.71     4.38  TRUE
## 14     3.73     4.42  TRUE
## 15     3.78     4.25  TRUE

An equivalent but probably faster method:

alligator$used <- TRUE;
alligator$used[na.action(t.test)] <- FALSE;
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • thank you. So na.action() is also another return object from lm ? is that right ? – Yashar Jun 30 '16 at 08:40
  • That's correct, it's one of the named list components on the returned object, accessible directly as `$na.action`. – bgoldst Jun 30 '16 at 08:49
  • When `subset` is used in `lm`, `rownames(alligator) %in% names(t.test$residuals)` (probably slow) seems to work. – chan1142 Sep 05 '18 at 06:35