" Error: Invalid grouping factor specification"

Question

I'm trying to create a mixed effects model for some data I'm analyzing. It previously worked as a fixed effects before I decided to change one of the variables (countryfactor) to a random effects (random intercept) variable. The issue is that when I run it I get the following message:

"Error: Invalid grouping factor specification, countryfactor".

I've seen on other posts that this is usually an issue with there being NA entries, but I've checked all the variables in my model and none have any NA entries.

Does anyone know what might be causing this error message? Posted the model code below.

    glmer(
    formula = 
      as.numeric(wheezing_InD) ~ 
      as.factor(mainfuel) + 
      age_InD + 
      as.factor(gender_InD) +
      as.factor(school_level_InD3) +
      as.factor(enough_money_InD) +
      as.factor(cooking_location_InD) +
      as.factor(other_smokers_household_InD) +
      as.factor(AnyCondition) + 
      as.factor(owned_items_electricity_connection_R) +
      as.factor(HealthAdviceFull) +
      (1|countryfactor),
    family=poisson(link="log"),
    data = Data46)

update

Tried with a simpler model, with just the first 20 rows and the following 3 columns.

glmer(
    formula = 
      as.numeric(wheezing_InD) ~ 
      age_InD + 
      (1|countryfactor),
    family=poisson(link="log"),
    data = Data46)

Still have the same error code. Here is a sample of the first 20 rows with these 3 variables, using dput:

structure(list(wheezing_InD = c("No", "Yes", "Yes", "No", "No", 
"No", "No", "No", "Yes", "No", "No", "No", "No", "Yes", "No", 
"No", "Yes", "Yes", "Yes", "No"), age_InD = c(55L, 24L, 23L, 
30L, 40L, 43L, 37L, 38L, 18L, 23L, 28L, 33L, 27L, 54L, 23L, 23L, 
42L, 48L, 31L, 18L), countryfactor = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("cameroon", "ghana", "kenya"), class = "factor")), row.names = c(NA, 
20L), class = "data.frame")

Have also attached the str version too:

'data.frame':   20 obs. of  3 variables:
 $ wheezing_InD : chr  "No" "Yes" "Yes" "No" ...
 $ age_InD      : int  55 24 23 30 40 43 37 38 18 23 ...
 $ countryfactor: Factor w/ 3 levels "cameroon","ghana",..: 1 1 1 1 1 1 1 1 1 1 ...

Is `contryfacgtor` the name of a column in your data set? What class is it? How many rows do you have and how many unique values does it take? — Gregor Thomas, Apr 11 '22 at 13:39
Thanks for the response. Yeah its a column in the dataset, originally a character variable but modified it into a factor variable, 1199 rows, 3 unique values — user18772311, Apr 11 '22 at 13:43
Maybe post `table(Data46$countryfactor)` so we can have a look. Beyond that, we don't have a lot to go on here. Can you fit a very simple model? Say `wheezing_InD ~ ageInD + (1 | countryfactor)`? Try to make a **minimal** example that illustrates the problem and then share enough data to reproduce it. If that 2-variable formula shows the same error, try it on a subset of data. Find, say, 20 rows that demonstrate the problem and then share them so we can work on it. — Gregor Thomas, Apr 11 '22 at 14:20
If you do share data, `dput()` is the nicest way as it is copy/pasteable and includes all class and structure information. `dput(Data46[1:20, c("wheezing_InD", "ageInD", "countryfactor")])` will share the first 20 rows of those 3 columns. — Gregor Thomas, Apr 11 '22 at 14:21
Thanks for the advice, have updated the post above with this info — user18772311, Apr 11 '22 at 15:05
If this is really what your data look like (i.e. the response variable `wheezing_InD` is a *character vector* and not a factor) then `as.numeric(wheezing_inD)` will convert the entire response vector to NAs ... admittedly we could do better with the error messages here ... — Ben Bolker, Apr 11 '22 at 15:08
I see, that makes sense. I formatted it as as.numeric() because when I ran it with the variable as any other format I get: Error in mkRespMod(fr, family = family) : response must be numeric — user18772311, Apr 11 '22 at 15:14
I have just tried as.numeric(as.factor(wheezing_InD)) and the model has successfully ran! Thank you for both of your help — user18772311, Apr 11 '22 at 15:17

score 1 · Answer 1 · answered Apr 11 '22 at 16:17

If this is really what your data look like (i.e. the response variable wheezing_InD is a character vector and not a factor) then as.numeric(wheezing_inD) will convert the entire response vector to NAs ... admittedly lme4 could provide a more informative error message here ...

Binomial responses can be specified in most R modeling functions very flexibly (I would say too flexibly).

For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:

As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).

As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).

As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

Let's consider your options:

wheezing_inD alone will give an error (it's a character, which isn't in the allowed set)
as.factor(wheezing_inD) or factor(wheezing_inD) should work fine (option 1 above: the model will estimate the proportion of "Yes" values, since R will use alphabetical order to make "No" the first level and "Yes" the second
as.numeric(factor(wheezing_inD))-1 is OK, as.numeric(as.factor()) converts ("No", "Yes") to (1,2) and subtracting 1 gives (0,1). (This is option 2, we don't need "weights" because we only have 1 'trial' per observation (Bernoulli/binomial with n=1).

Option 3 is really only relevant for binomial data with N>1.

as_numeric(factor(wheezing_inD)) seems weird to me as it will result in (1,2) responses, which should give you an error?

I see what you mean. I would ideally use the response in a factor form (I have used it like that in a fixed effect binomial before and it worked fine). Unfortunately using the response as a factor isn't working for this model - I get the following error message; Error in mkRespMod(fr, family = family) : response must be numeric — user18772311, Apr 13 '22 at 08:32

" Error: Invalid grouping factor specification"

1 Answers1