2

I've got the dataframe below and trying to compute if there is a significant difference in the proportions between the groups within each category. E.g. category A group 1 verus 2, 1 versus 3, and 2 versus 3.

Is there a way to calculate and add the p-values to the dataframe as new columns without having to calculate it and add it manually one row at a time?

Or is there a way to calculate them and store them in a separate data frame?

          Group   Category number min total  Proportion
1          1         A      6  2.5    33 0.1818182
2          1         B      4  3.2    33 0.1212121
3          1         C     16  3.2    33 0.4848485
4          1         D      7  3.1    33 0.2121212
5          2         A     22  6.4   133 0.1654135
6          2         B     17  6.7   133 0.1278195
7          2         C     56  6.0   133 0.4210526
8          2         D     38  6.4   133 0.2857143
9          3         A      3 10.0    22 0.1363636
10         3         B      3  9.7    22 0.1363636
11         3         C      9 10.6    22 0.4090909
12         3         D      7  9.9    22 0.3181818

1 Answers1

1

The solution is quite complicated although it looks like an easy task. Here is the solution using the purrr package as the core tool.

Let's import data:

data <- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
 3L, 3L), Category = c("B", "C", "D", "A", "B", "C", "D", "A", 
 "B", "C", "D"), number = c(4L, 16L, 7L, 22L, 17L, 56L, 38L, 3L, 
 3L, 9L, 7L), min = c(3.2, 3.2, 3.1, 6.4, 6.7, 6, 6.4, 10, 9.7, 
 10.6, 9.9), total = c(33L, 33L, 33L, 133L, 133L, 133L, 133L, 
 22L, 22L, 22L, 22L), Proportion = c(0.1212121, 0.4848485, 0.2121212, 
 0.1654135, 0.1278195, 0.4210526, 0.2857143, 0.1363636, 0.1363636, 
 0.4090909, 0.3181818)), row.names = 2:12, class = "data.frame")

and required packages:

library(dplyr) # mutate, group_by and rowwise functions
library(tidyr) # nest
library(purrr) # map
library(combinat) # combn

We will create tibble object foo which divides original dataset according to groups. That allows us to map function to the groups.

foo <- foo %>% mutate(tab = map(data, combFun)) 

Now we define own function combPval which 1) creates a data.frame of combinations of factors (combTab), 2) creates data.frame tab1 which stores relevant columns for prop.test. These data.frames are merged in subsequent steps to create data.frame data. prop.test is then applied by in a rowwise way.

combPval <- function(group){
 combTab <- combn(unique(group$Category), 2) %>% t() %>% data.frame()
 tab1 <- group %>% select(Category, number, total)
 combTab
 temp <- merge(y=combTab, x=tab1, by.y="X2", by.x="Category" ) 
 data <- merge(y=temp, x=tab1, by.y="X1", by.x="Category")

 data <- data %>% 
  rowwise() %>% 
  mutate(
   pval = prop.test(x=c(number.x, number.y), n=c(total.x, total.y))$p.val
  )

 data
}

Function combPval is applied in the following way:

foo <- foo %>% mutate(results = map(data, combPval))

Results for the first group can be obtained:

 foo$results[[1]]

 # A tibble: 3 x 7
 # Rowwise: 
   Category number.x total.x Category.y number.y total.y    pval
   <chr>       <int>   <int> <chr>         <int>   <int>   <dbl>
 1 B               4      33 C                16      33 0.00322
 2 B               4      33 D                 7      33 0.509  
 3 C              16      33 D                 7      33 0.0388 
Lstat
  • 1,450
  • 1
  • 12
  • 18
  • Has my answer solved your problem? If so, please consider closing the question by accepting the answer. – Lstat Jun 21 '20 at 13:04