0

I'm trying to understand the theory of this and what the term is called. I'd like to code this in R.

In the dataset there are n number of people, all who could have up to z conditions.

So for example, I want to know of the people who have 3 conditions, what are the most likely groups of conditions they have. Person A has conditions {1,2,3}, Person B has conditions {4,7,8}, Person C has conditions {2,5,8} and I would like to show what are the most likely clusters of conditions they could have.

I am looking to expand this problem to people who have n number of conditions, so People with 4 conditions, 5, etc.

2 Answers2

0

For obtaining the probabilities you can group people with same conditions and filter groups with same condition count.

Assuming n different conditions and for every condition: 1 means a person is suffering from a condition, 0 otherwise:

no_of_cond <- ncol(df)                                       # number of conditions

Evaluate condition_set and condition_count for each individual:

df$condition_set <- apply(df, 1, function(x) {if (sum(x)>0) { paste(names(which(x == 1)),collapse = ", ")
                                                            } else {return(NA)}
                                             })
df$condition_count <- rowSums(df[,1:no_of_cond])

Grouping people with same conditions and filtering groups with same condition_count:

library(dplyr)

case_count_df <- function(n) { df_temp <- df %>% group_by_all() %>% 
                                          summarise(ppl_count= n()) %>% 
                                          filter(condition_count == n)  
                                          return (df_temp) }

Summary for people with 2 conditions, others can be obtained similarly:

df_2_cond <- case_count_df(2) %>% ungroup()
df_2_cond$prob <- df_2_cond$ppl_count/sum(df_2_cond$ppl_count)
plot(as.factor(df_2_cond$condition_set), df_2_cond$prob, xlab = 'condition_set', 
     ylab = 'probability', main = "People with 2 conditions")

people with 2 conditions

Dummy Data:

df <- data.frame(expand.grid( a = rep(c(0,1),2), b = rep(0,3), 
                              c = c(0,1,0), d = c(0,0,1) ))

PS: All above is basic aggregation. For any statistical tests, inferences cross validated would be a better forum.

Mankind_008
  • 2,158
  • 2
  • 9
  • 15
0

You probably are looking for frequent itemsets.

In your case items are conditions, so frequent sets of conditions.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thank you, you are correct. I ended up using the `aRules` and `aRulesViz` package to achieve what I was looking for. – rahulkalluri Jul 09 '18 at 22:58