8

The help file for lm() doesn't go into the syntax for the subset argument. I am not sure how to get it to find the line of best fit for only a portion of my data set. This question is similar, but I wasn't able to solve my particular problem using it. How does the subset argument work in the lm() function?

Here is my code:

    with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], plot(SP.RICH~SIZE, log="x",
      xlim=c(1,9), ylim=c(60,180), ylab="plant species richness", 
      xlab="log area (ha)", type="n"))
   with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], points(SP.RICH~SIZE, pch=20, cex=1))
   fit=lm(SP.RICH~SIZE, subset=c(1:7))

I would like to make sure that the regression line is drawn only for the values that I subset above in the plot() and points() commands.

Community
  • 1
  • 1
eyerah
  • 129
  • 1
  • 2
  • 7
  • I realize now that I should have asked how to filter values in the lm() command, rather than how to subset them. I will try searching for more information on this now, but I will leave this question up in the meantime. – eyerah Oct 13 '15 at 22:18
  • Now I am trying something like: fit=with(dat[dat$SIZE <7 & dat$SIZE > 0.8 ,], lm(SP.RICH~SIZE)) but the line doesn't look right. I am not sure how to include the fact that the x axis is on a log scale, so that I get a proper line. – eyerah Oct 13 '15 at 22:32

2 Answers2

18

The subset parameter in lm() and other model fitting functions takes as its argument a logical vector the length of the dataframe, evaluated in the environment of the dataframe. So, if I understand you correctly, I would use the following:

fit <- lm(SP.RICH~SIZE, data=dat, subset=(SIZE>0.8 & SIZE<7))
cryptic_star
  • 1,863
  • 3
  • 26
  • 47
tom 2
  • 334
  • 2
  • 3
  • Thanks for the advice. The line of code that you provided did fit a line to my data, but unfortunately it wasn't the proper line. So you have solved my syntax question, and I now know how to filter/subset in model fitting functions. Now I just have to figure out how to get the proper line. Do you think that the problem might be that my x axis is on the log scale? – eyerah Oct 13 '15 at 22:50
1

But the above solution does not help if you want to run one lm for each group in your data - lets say that you have different countries as a column and you want to understand the relationship between richness and size within each country.

For that I recommend following the help for the function by in R http://astrostatistics.psu.edu/su07/R/html/base/html/by.html:

require(stats)
attach(warpbreaks)
by(warpbreaks[, 1:2], tension, summary)
by(warpbreaks[, 1], list(wool = wool, tension = tension), summary)
by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))

## now suppose we want to extract the coefficients by group
tmp <- by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))
sapply(tmp, coef)

From the list tmp you can extract any lm parameters you like.

AEM
  • 919
  • 1
  • 9
  • 22