6

I am trying to calculate several binomial proportion confidence intervals. My data are in a data frame, and though I can successfully extract the estimate from the object returned by prop.test, the conf.int variable seems to be null when run on the data frame.

library(dplyr)

cases <- c(50000, 1000, 10, 2343242)
population <- c(100000000, 500000000, 100000, 200000000)

df <- as.data.frame(cbind(cases, population))
df %>% mutate(rate = prop.test(cases, population, conf.level=0.95)$estimate)

This appropriately returns

    cases population       rate
1   50000      1e+08 0.00050000
2    1000      5e+08 0.00000200
3      10      1e+05 0.00010000
4 2343242      2e+08 0.01171621

However, when I run

df %>% mutate(confint.lower= prop.test(cases, pop, conf.level=0.95)$conf.int[1])

I sadly get

Error in mutate_impl(.data, dots) : 
  Column `confint.lower` is of unsupported type NULL

Any thoughts? I know alternative ways to calculate the binomial proportion confidence interval, but I would really like to learn how to use dplyr well.

Thank you!

Edward
  • 10,360
  • 2
  • 11
  • 26
PBB
  • 131
  • 1
  • 7
  • 1
    @akrun my apologies -- vestigial evidence of real data vs my attempt to share a reproducible chunk. I edited the code. Thanks. – PBB Jun 21 '18 at 21:04
  • What is `sumcases`? :) – SeGa Jun 21 '18 at 21:08
  • if we dig into the help page for `?prop.test` in the `Value` section the description for `conf.int` tells us "a confidence interval for the true proportion if there is one group, or for the difference in proportions if there are 2 groups and p is not given, or NULL otherwise" so you need to test either one or two groups to generate a non`NULL` value for you `conf.int`, not the 4 groups that are currently being tested – Nate Jun 21 '18 at 21:10
  • @SeGa so sorry, shouldn't have tried to do a hasty edit of variable names just before pasting. sumcases=cases. Post edited. thank you. – PBB Jun 21 '18 at 21:18
  • 1
    @Nate I saw that, but I thought that `dplyr` would be doing a sort of row-wise call of `prop.test` and thus the rows would each be considered individually, and the first part of your quoted section ("a confidence interval for the true proportion if there is one group") would apply. Am I misunderstanding `dplyr`? – PBB Jun 21 '18 at 21:26
  • 1
    Try `library(purrr) ; library(dplyr) ; df %>% mutate(confint.lower = map2(.x = cases, .y = population, .f = ~ prop.test(.x, .y, conf.level=0.95)$conf.int[1]))`. I haven't really dug into the htest class to figure out why your version isn't working, but this should. EDIT: Actually, on quick reflection, it's probably not working because `prop.test` isn't vectorized. – Jake Fisher Jun 21 '18 at 21:34

2 Answers2

10

You can use dplyr::rowwise() to group on rows:

df %>%
    rowwise() %>%
    mutate(lower_ci = prop.test(cases, pop, conf.level=0.95)$conf.int[1])

By default dplyr takes the column names and treats them like vectors. So vectorized functions, like @Jake Fisher mentioned above, just work without rowwise() added.

This is what I would do to catch all of the confidence interval components at once:

df %>%
    rowwise %>%
    mutate(tst = list(broom::tidy(prop.test(cases, pop, conf.level=0.95)))) %>%
    tidyr::unnest(tst)
Nate
  • 10,361
  • 3
  • 33
  • 40
3

As of version 1.0.0, rowwise() is no longer being questioned.

As of version 0.8.3 of dplyr, the lifecycle status of the rowwise() function is "questioning".

As an alternative, I would rather recommend the use of purrr::map2() to achieve the goal:

df %>%
  mutate(rate = map2(cases, pop, ~ prop.test(.x, .y, conf.level=0.95) %>%
                                     broom::tidy())) %>%
  unnest(rate)
Nate
  • 10,361
  • 3
  • 33
  • 40
Marc Choisy
  • 141
  • 4