Calculate the probability of people that have x "conditions" have the same conditions in R?

Question

I'm trying to understand the theory of this and what the term is called. I'd like to code this in R.

In the dataset there are n number of people, all who could have up to z conditions.

So for example, I want to know of the people who have 3 conditions, what are the most likely groups of conditions they have. Person A has conditions {1,2,3}, Person B has conditions {4,7,8}, Person C has conditions {2,5,8} and I would like to show what are the most likely clusters of conditions they could have.

I am looking to expand this problem to people who have n number of conditions, so People with 4 conditions, 5, etc.

There doesn't seem to be any programming specific question here. Seems like a better fit for [stats.se] where questions about statistics are on-topic. — MrFlick, Jun 14 '18 at 16:10
If I understood you correctly. The below code should suffice for aggregation requirement. — Mankind_008, Jun 14 '18 at 19:53

Mankind_008 · Accepted Answer · 2018-06-14T17:59:27.003

For obtaining the probabilities you can group people with same conditions and filter groups with same condition count.

Assuming n different conditions and for every condition: 1 means a person is suffering from a condition, 0 otherwise:

no_of_cond <- ncol(df)                                       # number of conditions

Evaluate condition_set and condition_count for each individual:

df$condition_set <- apply(df, 1, function(x) {if (sum(x)>0) { paste(names(which(x == 1)),collapse = ", ")
                                                            } else {return(NA)}
                                             })
df$condition_count <- rowSums(df[,1:no_of_cond])

Grouping people with same conditions and filtering groups with same condition_count:

library(dplyr)

case_count_df <- function(n) { df_temp <- df %>% group_by_all() %>% 
                                          summarise(ppl_count= n()) %>% 
                                          filter(condition_count == n)  
                                          return (df_temp) }

Summary for people with 2 conditions, others can be obtained similarly:

df_2_cond <- case_count_df(2) %>% ungroup()
df_2_cond$prob <- df_2_cond$ppl_count/sum(df_2_cond$ppl_count)
plot(as.factor(df_2_cond$condition_set), df_2_cond$prob, xlab = 'condition_set', 
     ylab = 'probability', main = "People with 2 conditions")

Dummy Data:

df <- data.frame(expand.grid( a = rep(c(0,1),2), b = rep(0,3), 
                              c = c(0,1,0), d = c(0,0,1) ))

PS: All above is basic aggregation. For any statistical tests, inferences cross validated would be a better forum.

score 0 · Answer 2 · answered Jun 16 '18 at 18:26

0

You probably are looking for frequent itemsets.

In your case items are conditions, so frequent sets of conditions.

answered Jun 16 '18 at 18:26

Has QUIT--Anony-Mousse

76,138
12
138
194

Thank you, you are correct. I ended up using the `aRules` and `aRulesViz` package to achieve what I was looking for. – rahulkalluri Jul 09 '18 at 22:58

Calculate the probability of people that have x "conditions" have the same conditions in R?

2 Answers2