4

There are two questions about dplyr, which in my case are related by the problem I am trying to solve:

  1. How can I cross-classify a data_frame using pipes, when trying to pass the resultant of a series of operations to xtabs?
  2. The argument of a pipe is usually denoted by . in dplyr & magrittr, but this is also the token used to denote everything else in the formula interface. I know that there is an open issue on dplyr somewhere (can't locate it right now) which talks about replacing . with _.

Here is an example:

wakefield::r_data_frame(
  n = 100,
  cat1 = r_sample_factor(x = LETTERS[1:3]),
  cat2 = r_sample_factor(x = LETTERS[1:3]),
  cat3 = r_sample_factor(x = LETTERS[1:3]),
  bin1 = r_sample_logical()
) %>%
  dplyr::filter(bin1) %>%
  xtabs(. ~ cat1 + cat2 + cat3, data = .)

which fails with the output:

Error in model.frame.default(formula = . ~ cat1 + cat2 + cat3, data = .) : 
  invalid type (list) for variable '.' 

because magrittr is replacing the first . with the resultant data_frame of the previous computations. One way is to omit the first period altogether, like so:

wakefield::r_data_frame(
  n = 100,
  cat1 = r_sample_factor(x = LETTERS[1:3]),
  cat2 = r_sample_factor(x = LETTERS[1:3]),
  cat3 = r_sample_factor(x = LETTERS[1:3]),
  bin1 = r_sample_logical()
) %>%
  dplyr::filter(bin1) %>%
  xtabs( ~ cat1 + cat2 + cat3, data = .)

But what if the . needed to go on the other side of the formula?

Edit:

As pointed out by @MrFlick, xtabs does not take a RHS . anyway. I thought that this problem could just as well be exemplified using the RHS . conflict which I expected using the code:

wakefield::r_data_frame(
  n = 100,
  cat1 = r_sample_factor(x = LETTERS[1:3]),
  cat2 = r_sample_factor(x = LETTERS[1:3]),
  cat3 = r_sample_factor(x = LETTERS[1:3]),
  bin1 = r_sample_logical()
) %>%
  dplyr::filter(bin1) %>%
  dplyr::select(-bin1) %>%
  xtabs( ~ ., data = .)

but this does work exactly as expected. Can someone explain why magrittr is not trying replace the first . with the data_frame?

tchakravarty
  • 10,736
  • 12
  • 72
  • 116
  • use a string formula and paste in `paste(names(.), collapse = '+')` as necessary – eddi May 07 '15 at 17:13
  • Where does `wakefield::r_data_frame` come from? I'm having difficulty reproducing the problem. – MrFlick May 07 '15 at 17:26
  • @MrFlick The namespace lookup indicates where it comes from, in this case [here](https://github.com/trinker/wakefield). – tchakravarty May 07 '15 at 17:28
  • I googled it and that was not near the top of the pile. Packages not on CRAN are not easy to find if you don't know where to look. You should make sure to include that in the question itself. Also, your problem/issue doesn't seem contingent on that package so why require it as a dependency for your minimal reproducible example? – MrFlick May 07 '15 at 17:30
  • Also, in what cases is a `.` valid on the left-hand side of a formula? Are you claiming it works when not using magrittr? – MrFlick May 07 '15 at 17:32
  • @MrFlick Because, as I well know, that without an MWE, there would be further trouble. And the `wakefield` package is an easy way to generate data for examples. – tchakravarty May 07 '15 at 17:32
  • @MrFlick Great catch. I will reword my question to ask about `.` on the RHS. – tchakravarty May 07 '15 at 17:33
  • Why not use count()? – hadley May 09 '15 at 01:54
  • @hadley Need the table layout for easy reading. Your suggestion is good too, which would be to `group_by` and `count`. – tchakravarty May 09 '15 at 09:21
  • 1
    @hadley Or, as I have just realised, `count` does the `group_by` for you. – tchakravarty May 09 '15 at 09:35

1 Answers1

1

magrittr does not replace periods (".") in formulas. That behavior is described briefly in the vignette (search for "formula").

MrFlick
  • 195,160
  • 17
  • 277
  • 295