10

I found this pdf on R formulas and I am not able to figure out how the | works (see the table on the second page). Furthermore, I could not find any explanation on the web. It appears from time to time in lists for possible formula symbols but without any example.

I think that it might be out of date because of other ways to achieve whatever it did.

Does anybody know how to use | in a formula and what it exactly achieves?

A bit of code with shows my clumsy attempt to use |.

x <- rnorm(100)
y <- rnorm(100)
z <- sample(c(TRUE, FALSE), 100, replace = TRUE )

lm(y ~ x|z)
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
Alex
  • 4,925
  • 2
  • 32
  • 48
  • 1
    Where is it used? – Alex Feb 23 '17 at 14:07
  • `|` is a logical `OR` that evaluates elementwise. And, I am sure there are ample resources on here that could help explain that. – Abdou Feb 23 '17 at 14:08
  • A number of modelling packages use it including `lfe`, `AER` (`ivreg`), and `lme4`. – lmo Feb 23 '17 at 14:09
  • 1
    @Abdou I know how the logical OR works in R. But some symbols such as `*` have a special meaning if used in a formula. This is what I expected for the `|` as well. – Alex Feb 23 '17 at 14:22
  • 1
    @Alex, I wasn't aware that some packages are using it to denote conditional probability and such. Good to know. – Abdou Feb 23 '17 at 14:44

2 Answers2

16

The symbol | means different things depending on the context:

The general case

In general, | means OR. General modeling functions will see any | as a logic operator and carry it out. This is the equivalent of using another operator, eg ^ as in:

lm(y~ x + x^2)

The operator is carried out first, and this new variable is then used to construct the model matrix and do the fitting.

In your code, | also means OR. You have to keep in mind that R interpretes numeric values also as logical when you use any logical operator. A 0 is seen as FALSE, anything else as TRUE.

So your call to lm constructs a model of y in function of x OR z. This doesn't make any sense. Given the values of x, this will just be y ~ TRUE. This is also the reason your model doesn't fit. Your model matrix has 2 columns with 1's, one for the intercept and one for the only value in x|z, being TRUE. Hence your coefficient for x|z can't even be calculated, as shown from the output:

> lm(y ~ x|z)

Call:
lm(formula = y ~ x | z)

Coefficients:
(Intercept)    x | zTRUE  
   -0.01925           NA  

Inside formulas for mixed models

In mixed models (eg lme4 package), | is used to indicate a random effect. A term like + 1|X means: "fit a random intercept for every category in X". You can translate the | as "given". So you can see the term as "fit an intercept, given X". If you keep this in mind, the use of | in specifications of correlation structures in eg the nlme or mgcv will make more sense to you.

You still have to be careful, as the exact way | is interpreted depends largely on the package you use. So the only way to really know what it means in the context of the modeling function you use, is to check that in the manual of that package.

Other uses

There are some other functions and packages that use the | symbol in a formula interface. Also here it pretty much boils down to indicating some kind of group. One example is the use of | in the lattice graphic system. There it is used for faceting, as shown by the following code:

library(lattice)
densityplot(~Sepal.Width|Species,
            data = iris,
            main="Density Plot by Species",
            xlab="Sepal width")
Alex
  • 4,925
  • 2
  • 32
  • 48
Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • 1
    So it is not used in general in formulas but only implemented in some packages. Base R functions such as `lm()` evaluate it as simple logical operator without special meaning in the context of the formula? – Alex Feb 23 '17 at 14:19
  • @Alex you got it. I added that bit of info to make that more obvious. – Joris Meys Feb 23 '17 at 14:19
  • You can find more information on the intro vignette of the `lme4` package : https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf – Joris Meys Feb 23 '17 at 14:21
  • 1
    Joris: you should also enrich with the possible usage in trellis graphics based on formulas (panels/conditions) - still I upvote for you – Eric Lecoutre Feb 23 '17 at 14:23
  • @EricLecoutre Good point. Hardly anybody uses it any more, but I included the `lattice` use for completeness. – Joris Meys Feb 23 '17 at 14:31
  • is there any way to construct a formula object with `|` as a grouping variable for use in those packages? I.e., I would like to call something like `rlang::new_formula(lhs, rhs, group)` – Dylan Russell Aug 30 '20 at 04:44
2

The general way it is used is dependent ~ independent | grouping You can read more here http://talklab.psy.gla.ac.uk/KeepItMaximalR2.pdf

Dinesh.hmn
  • 713
  • 7
  • 21