0

I have a question concerning multi level regression models in R, specifically how to add predictors for my level 2 "measure".

Please consider the following example (this is not a real dataset, so the values might not make much sense in reality):

date        id    count    bmi    poll
2012-08-05  1     3        20.5   1500
2012-08-06  1     2        20.5   1400
2012-08-05  2     0        23     1500
2012-08-06  2     3        23     1400

The data contains

  • different persons ("id"...so it's two persons)
  • the body mass index of each person ("bmi", so it doesn't vary within an id)
  • the number of heart problems each person has on a specific day ("count). So person 1 had three problems on August the 5th, whereas person 2 had no difficulties/problems on that day
  • the amount of pollutants (like Ozon or sulfit dioxide) which have been measured on that given day

My general research question is, if the amount of pollutants effects the numer of heart problems in the population. In a first step, this could be a simple linear regression: lm(count ~ poll)

However, my data for each day is so to say clustered within persons. I have two measures from person 1 and two measures from person 2.

So my basic idea was to set up a multilevel model with persons (id) as my level 2 variable.

I used the nlme package for this analysis:

lme(fixed=count ~ poll, random = ~poll|id, ...)

No problems so far.

However, the true influence on level 2 might not only come from the fact that I have different persons. Rather it would be much more likely that the effect WITHIN a person might come from his or her bmi (and many other person related variables, like age, amount of smoking and so on).

To make a longstory short:

How can I specify such level two predictors in the lme function?

Or in other words: How can I setup a model, where the relationship between heart problems and pollution is different/clustered/moderated by the body mass index of a person (and as I said maybe additionally by this person's amount of smoking or age)

Unfortunately, I don't have a clue, how to tell R, what I want. I know oif other software (one of them called HLM), which is capable of doing waht I want, but I'm quite sure that R can this as well...

So, many thanks for any help!

deschen

deschen
  • 10,012
  • 3
  • 27
  • 50

1 Answers1

1

Short answer: you do not have to, as long as you correctly specify random effects. The lme function automatically detects which variables are level 1 or 2. Consider this example using Oxboys where each subject was measured 9 times. For the time being, let me use lmer in the lme4 package.

library(nlme)
library(dplyr)
library(lme4)
library(lmerTest)

Oxboys %>%                                                #1
  filter(as.numeric(Subject)<25) %>%                      #2
  mutate(Group=rep(LETTERS[1:3], each=72)) %>%            #3
  lmer(height ~ Occasion*Group + (1|Subject), data=.) %>% #4     
  anova()                                                 #5  

Here I am picking 24 subjects (#2) and arranging them into 3 groups (#3) to make this data balanced. Now the design of this study is a split-plot design with a repeated-measures factor (Occasion) with q=9 levels and a between-subject factor (Group) with p=3 levels. Each group has n=8 subjects. Occasion is a level-1 variable while Group is level 2.

In #4, I did not specify which variable is level 1 or 2, but lmer gives you correct output. How do I know it is correct? Let us check the multi-level model's degrees of freedom for the fixed effects. If your data is balanced, the Kenward–Roger approximation used in the lmerTest will give you exact dfs and F/t-ratios according to this article. That is, in this example dfs for the test of Group, Occasion, and their interaction should be p-1=2, q-1=8, and (p-1)*(q-1)=16, respectively. The df for the Subject error term is (n-1)p = 21 and the df for the Subject:Occasion error term is p(n-1)(q-1)=168. In fact, these are the "exact" values we get from the anova output (#5).

I do not know what algorithm lme uses for approximating dfs, but lme does give you the same dfs. So I am assuming that it is accurate.

Masato Nakazawa
  • 970
  • 6
  • 11
  • Well, thanks for that answer, which helps me very much. I'll try it tomorrow using the suggested lme4 package. – deschen Aug 06 '14 at 18:40
  • I've now tried to build up a multilevel model with lmer package. However, I'm a little bit stuck in specifying, which is a random and which a fixed effect (honestly, I never fully understood the concepts). To stay with my example: if _id_ contains my participants and _bmi_ is constant across these persons (ids), is that a fixed or random effect? Same for my pollutants (_poll_). They vary each day and are on level 1. So is this a random or fixed effect? – deschen Aug 08 '14 at 07:56
  • That's very a nuanced issue. I cannot tell you since I do not have sufficient information about your design/purpose of your study. You may want to check [this post out](http://stats.stackexchange.com/questions/4700/what-is-the-difference-between-fixed-effect-random-effect-and-mixed-effect-mode). I am assuming that you are a researcher, not a statistician. Then my best advice would be this: follow what people do in your discipline, unless you are willing to argue otherwise. – Masato Nakazawa Aug 08 '14 at 12:18
  • You're right. And I've already found this post and followed the links. The problem is mostly due to the fact that there are many definitions of random/fixed and it is sometimes difficult to translate the regarding definitions for the purposes of the own study. In my example above I would say that _bmi_ is a random effect, because one can say that the _bmi_-values are a random sample from the population. However, the pollutants (_poll_), which have been measured throughout the year, would be IMO fixed effects, because we have the complete measures of the year, not just a sort of random sample. – deschen Aug 08 '14 at 15:13