8

I have this code

dens <- read.table('DensPiu.csv', header = FALSE)
fl <- read.table('FluxPiu.csv', header = FALSE)
mydata <- data.frame(c(dens),c(fl))

dat = subset(mydata, dens>=3.15)
colnames(dat) <- c("x", "y")
attach(dat)

and I would like to do a least-square regression on the data contained in dat, the function has the form

y ~ a + b*x

and I want the regression line to pass through a specific point P(x0,y0) (which is not the origin).

I'm trying to do it like this

 x0 <- 3.15 

 y0 <-283.56

 regression <- lm(y ~ I(x-x0)-1, offset=y0)

(I think that data = dat is not necessary in this case) but I get this error :

Error in model.frame.default(formula = y ~ I(x - x0) - 1, : variable
 lengths differ (found for '(offset)').

I don't know why. I guess that I haven't defined correctly the offset value but I couldn't find any example online.

Could someone explain to me how offset works, please?

ac2051
  • 354
  • 1
  • 2
  • 15
  • Can you provide a reproducible example, with data, please? – joran Jun 04 '13 at 14:41
  • 2
    What is the difference between this question and your previous one please? – agstudy Jun 04 '13 at 14:41
  • Now I'm asking to define the object that goes in _offset_. My previous question was about how to make a regression passing for a specific point. – ac2051 Jun 04 '13 at 14:44
  • Help files say this must be a vector not a constant: `this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one are specified their sum is used. See model.offset.` – Thomas Jun 04 '13 at 14:47
  • Well, that right there is a very strong argument that you shouldn't have asked a second question. Questions on StackOverflow should be completely self contained. This is why you got some comments about the similarity between the two, as some people (rightly) thought you should have simply edited this into your previous question. – joran Jun 04 '13 at 14:56
  • in the OP's defense, I think it's a bit of a judgement call -- I do agree that in this case it would be better to edit the previous question, but I can imagine a fairly similar scenario where someone could get chewed out for editing and *not* posting as a separate question ... – Ben Bolker Jun 04 '13 at 15:19
  • 1
    Thanks. I asked this question in a comment of the previous one but nobody answered. So as it's a completely separate topic (the use of _offset_ and not the regression passing through a point) I thought that it could be treated separately. – ac2051 Jun 04 '13 at 15:25
  • @Thomas I had already read the help file but I couldn't understand it. In particular, I didn't understand what do they mean for _cases_ when they say _length equal to the number of cases_.. – ac2051 Jun 04 '13 at 16:26

2 Answers2

11

Your offset term has to be a variable, like x and y, not a numeric constant. So you need to create a column in your dataset with the appropriate values.

dat$o <- 283.56
lm(y ~ I(x - x0) - 1, data=dat, offset=o)
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • Thanks for your answer. I've added the point P(x0,y0) to my database. Now it is the 161 element of _dat_. I've tried both `x0 <- 3.15 y0 <- dat[161,2] regression <- lm(y ~ I(x-x0)-1, offset=y0)` and this `y0 <- dat[161,] regression <- lm(y ~ I(x)-1, offset=y0)` but none of them works. What am I doing wrong? I've added the first part of the code to my question in order to make it clearer. – ac2051 Jun 04 '13 at 15:55
  • 2
    You're still creating a single constant with the given value. You need to pass a _vector_ of values as the offset. The easiest way to do this is as I posted: make a new column in your dataset. I assume `dat[161, 2]` is what your y0 is supposed to be? Do this: `dat$o <- dat[161,2]; lm(y ~ I(x - x0) - 1, offset=o, data=dat)` – Hong Ooi Jun 04 '13 at 16:01
  • Perfect, thank you very much! I had created a row instead of a column, this is why it didn't work! Thanks for your patience. – ac2051 Jun 04 '13 at 16:38
  • Is it right to use I(x - x0) ? The model would assume a normal distribution of the residuals vs x-x0 and that could not always be the case. And the problem will be worse if x and x0 are correlated. I don't know how the offset affect the fitting either. And from glm help: 'The null model will include the offset, and an intercept if there is one in the model. Note that this will be incorrect if the link function depends on the data other than through the fitted mean: specify a zero offset to force a correct calculation' – skan May 14 '17 at 19:49
4

In fact, the real issue here is that you should specify offset with a vector whose length is the same as the number of rows (or the length, if data is composed as a vector) of your data. The following code will do your job as expected:

regression <- lm(y ~ I(x-x0)-1, offset = rep(y0, length(y)))

Here is a good explanation for those who are interested: http://rfunction.com/archives/223

Liang Zhang
  • 753
  • 7
  • 20