5

I fit a Generalized Additive Model using gam from the mgcv package. I have a data table containing my dependent variable Y, an independent variable X, other independent variables Oth and a two-level factor Fac. I would like to fit the following model

Y ~ s(X) + Oth

BUT with the additional constraint that the s(X) term is fit only on one of the two levels of the factor, say Fac==1. The other terms Oth should be fit with the whole data.

I tried exploring s(X,by=Fac) but this biases the fit for Oth. In other words, I would like to express the belief that X relates to Y only if Fac==1, otherwise it does not make sense to model X.

yannick
  • 397
  • 3
  • 19

2 Answers2

5

Cheap trick: use an auxiliary variable that is X if Fac == 1 and 0 elsewhere.

library("mgcv")
library("ggplot2")


# simulate data

N <- 1e3

dat <- data.frame(covariate = runif(N),
                  predictor = runif(N),
                  group = factor(sample(0:1, N, TRUE)))

dat$outcome <- rnorm(N,
                     1 * dat$covariate +
                     ifelse(dat$group == 1,
                            .5 * dat$predictor +
                            1.5 * sin(dat$predictor * pi),
                            0), .1)

# some plots

ggplot(dat, aes(x = predictor, y = outcome,
                col = group, group = group)) +
    geom_point()

ggplot(dat, aes(x = covariate, y = outcome,
                col = group, group = group)) +
    geom_point()

# create auxiliary variable

dat$aux <- ifelse(dat$group == 1,
                  dat$predictor,
                  0)

# fit the data

fit1 <- gam(outcome ~ covariate + s(predictor, by = group),
            data = dat)

fit2 <- gam(outcome ~ covariate + s(aux, by = group),
            data = dat)

# compare fits

summary(fit1)

summary(fit2)
qenvio
  • 51
  • 2
  • If this was the correct answer, the value you give to the auxiliary variable in the case where the predictor is not to be modelled (here, 0) should not have an influence on the fit of the intercept. However, if you set it to, say, 10, you will see that the intercept changes greatly. – yannick Dec 15 '15 at 10:58
  • Sure @yannick , this is just a trick to "cheat" the gam function to create the `model.matrix` as you need it. I'm not sure how to do it otherwise. – qenvio Dec 15 '15 at 14:12
1

If I understand it right, you're thinking about some model with interaction like this:

Y ~ 0th + (Fac==1)*s(X)  

If you want to "express the belief that X relates to Y only if Fac==1" don't treat Fac as a factor, but as a numeric variable. In this case you will get numeric interaction and only one set of coefficients (when it's a factor there where two). This type of model is a varying coefficient model.

# some data
data <- data.frame(th = runif(100),
              X = runif(100),
              Y = runif(100),
              Fac = sample(0:1, 100, TRUE))
data$Fac<-as.numeric(as.character(data$Fac)) #change to numeric
# then run model
gam(Y~s(X, by=Fac)+th,data=data)

See the documentation for by option in the documentation ?s

Maju116
  • 1,607
  • 1
  • 15
  • 30
  • Please provide a working example. Bonus question: what if I my model is more complicated, and I already have another factor in my model, say `s(X, by=Fac2)`. How can I incorporate the above mentioned design? Can I do `s(X, by=c(Fac,Fac2))` ? The doc does not seem to allow this. – yannick Dec 15 '15 at 16:52
  • I add the example. With the second factor it depends what you want to achieve? What is the model in this case? For example if `X` relates to `Y` only when `Fac==0` and `Fac2==0` you can create new variable `New_Fac=Fac*Fac1` and then use `s(X,by=New_Fac)`. Please accept the answer if it was helpful. if you give me specification of your model i can think about it :) – Maju116 Dec 15 '15 at 17:11
  • Your answer, which is correct, does not allow to make a fit per level of factor `Fac2` in addition to what is exposed here, namely exclude the case `Fac==0`. I did not mention this in the question, but it is something I need in my application and I kept it out for clarity. So I will accept your very welcome answer but it would be great if you could propose a solution for that. I can also modify the question if you prefer. – yannick Dec 15 '15 at 17:36
  • Wow, you have really intresting model. I think I know how to solve your problem mathematically but don't know yet how to do it in R. I just need some time :) Can you tell where yo're using this model? – Maju116 Dec 15 '15 at 21:14
  • It's in an NGS application in bionformatics. If you're interested, there's also this question, which I was going to combine: http://stats.stackexchange.com/q/185871/5404 – yannick Dec 16 '15 at 09:46