1

I have a rather specific regular expressions problem that is causing me some grief. I have removed one or more fixed effects from a mixed model (either lme or lme4), and wish to remove the corresponding random slope(s). However, depending on the random structure, this may leave behind unnecessary + symbols or, worse, leave nothing preceding the |.

Take a list of random effects formulae from lme and lme4 obtained using lme.model$call$random and findbars(formula(lme4.model)) respectively:

   random.structures = list(
  "~ b | random1",
  "(b | random1)",
  "~ b + x1 | random1",
  "(b + x1 | random1)",
  "~ x1 + b| random1",
  "(x1 + b| random1)",
  "~ b + x1 + c | random1",
  "(b+ x1 + c | random1)",
  "~b + x1 + x2 | random1",
  "(b + x1 + x2 | random1)",
  "~ x1 + x2 + b | random1",
  "(x1 + x2 + b | random1)"
)

I have removed the variables b and c from the fixed effects formula using dropterms. Since they no longer exist as fixed effects, their random slopes should not be allowed to vary.

b and c can be removed from the random formulae above using the following line:

random.structures = lapply(random.structures, function(i) gsub("b|c", "", i))

Now, I wish to remove all leftover + symbols, i.e., those that do not link variables.

Then, in the event there is a blank space between ~ or ( and |, I wish to insert a 1.

The desired output is

random.structures2 = list(
  "~ 1 | random1",
  "(1 | random1)",
  "~ x1 | random1",
  "(x1 | random1)",
  "~ x1 | random1",
  "(x1 | random1)",
  "~ x1 | random1",
  "(x1 | random1)",
  "~ x1 + x2 | random1",
  "(x1 + x2 | random1)",
  "~ x1 + x2 | random1",
  "(x1 + x2 | random1)"
)

I have fiddled with gsub but just can't seem to get it right. For instance, this works:

gsub("(.*)\\+\\ |(.*)\\+(\\|)", "\\1", random.structures[[3]])
# Accounting for space or lack of space between + and |

But not for this:

gsub("(.*)\\+\\ |(.*)\\+(\\|)", "\\1", random.structures[[7]])

Alternately, if there is a preexisting function like dropterms for random structures, I'm all in!

Similarly, I can't reliable insert a 1 in the blank space inbetween ~ | or ( |.

jslefche
  • 4,379
  • 7
  • 39
  • 50
  • 5
    Are you sure you want to solve this regular expressions? If you are working with formulas, there are other functions for manipulating formulas that won't result in invalid syntax. If you want to remove a variable, try `update(y~a+b, ~.-b)` – MrFlick May 07 '15 at 19:20
  • Will this work for mixed models, particularly those in the `lme4` package where `update` works on the whole formula (fixed and random effects)? – jslefche May 07 '15 at 19:22
  • 1
    It would be helpful if you provided a reproducible example of what you are *actually* trying to accomplish. Provide input formulas, the variables you want to remove, and desired output. – MrFlick May 07 '15 at 19:25
  • I have updated the question demonstrating with the entire `random.structures` list should look like, in addition to the two examples I provided. – jslefche May 07 '15 at 19:36
  • But your example still starts with an arbitrary string (invalid formula).What did the formulas look like before you removed the variable? How did you attempt to remove the variable? There are probably better ways to remove the variable in the first place. It's better to fix the problem at the source rather than cleaning up downstream. – MrFlick May 07 '15 at 19:38
  • I've added a bit more detail that perhaps will elucidate my motivations more clearly. Let me know if you need further detail! – jslefche May 07 '15 at 19:56
  • Ah well, I've given up on answering this, but (1) consider using a vector; there's no need for a list here and (2) perhaps you can construct the formulas you want instead of parsing them after the fact. – Frank May 07 '15 at 20:48
  • 1
    Piecing together the random formula instead of replacing existing variables proved to be the most effective method -- thanks for the suggestion @Frank – jslefche May 12 '15 at 19:21
  • @jslefche Glad to hear you found a solution. If you have time and inclination, you could write your approach up as an answer to your own question. – Frank May 12 '15 at 19:25

1 Answers1

3

Half the items in your starting list are proper formulas (the ones with the "~"). I'm not sure what you are doing with the terms in the parenthesis. But for the formulas, you can use the Formula package for better support for dropping terms with conditioning terms.

Here I'll subset to the proper formulas and convert to Formula objects.

library(Formula)
rx <- lapply(random.structures[grep("~", random.structures)],
    function(x) Formula(as.formula(x)))

We can quickly peak at the results with

sapply(rx, deparse)

# [1] "~b | random1"
# [2] "~b + x1 | random1"
# [3] "~x1 + b | random1"
# [4] "~b + x1 + c | random1"
# [5] "~b + x1 + x2 | random1"
# [6] "~x1 + x2 + b | random1"

Now we can remove b and c from all of these with

nx <- lapply(x, function(x) update(x, ~.-b-c))

and view the results with

sapply(nx, deparse)

# [1] "~1 | random1" 
# [2] "~x1 | random1"
# [3] "~x1 | random1"
# [4] "~x1 | random1"
# [5] "~x1 + x2 | random1"
# [6] "~x1 + x2 | random1"

You should have no problem using these where ever you would use regular formulas.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Hmm, this is an interesting approach and should work for `lme` where the random effects formula is stored separately -- thanks!. I wasn't aware `Formula` would preserve the bars. But unfortunately it will not work with the syntax from `lmer` -- it will not remove or replace the random slope, e.g.: `x = "y ~ x2 + (x2 | random1)"; x = Formula(as.formula(x)); update(x, ~.-x2)` – jslefche May 07 '15 at 20:48