Return the frequency of counts of a condition, leaving other conditions intact

Question

Given the data.frame below, I would like to return the frequency of obs based on condition session and leaving the other conditions (cond 1, cond 2, cond 3) intact

> session   obs cond1   cond2   cond3
       1    A   close   30  0
       1    A   open    30  0
       1    A   close   30  0
       1    B   close   30  10
       1    C   close   27  2
       2    A   close   30  1
       2    A   close   30  6
       2    A   close   30  6
       2    A   close   30  6
       2    B   close   30  2

The data table has 4921 lines. Cond1 and cond 3 are character class; cond 2 is integer

By doing trial<-count(data, data$'obs', data$'cond1', mdata$ession) I can see how many obs per session in cond 1, for example.

But I need the following output to compare the frequencies of the observation in each condition:

session obs freq obs    cond1   cond2   cond3
        1   A   0.4         close   30  0
        1   A   0.2         open    30  0
        1   B   0.2         close   30  10
        1   C   0.2         close   27  2
        2   A   0.6         close   30  6
        2   B   0.2         close   30  2
        2   A   0.2         close   30  1

Being a noob in R, I tried using

ddply(data,c("session","obs","cond1", "cond2", "cond3"),summarize,freq=sum(obs/session))

but this is clearly a oversimplification and the other examples of similar questions I found here, also did not solve my problem.

Any ideas? Many thanks!!

I see, I am sorry Shawn. I used a dummy dataset for the sake of simplification because I have more conditions than those I showed here, so I thought it would be too messy. Luckily, my problem was solved by Allan, but now I know and next time I will share the real dataset! Thank you for letting me know and showing me how to do it! — Sofia, Feb 03 '23 at 06:52

Allan Cameron · Answer 1 · 2023-02-02T11:08:01.100

It seems that you are using the plyr package, which has been retired and superseded by dplyr. Using this package, you can group_by_all, then count the number of identical entries across all columns. The size of the counts divided by the the total counts in each session gives your desired result.

library(dplyr)

data %>%
  group_by_all() %>%
  summarize(prop = n(), .groups = 'drop') %>%
  group_by(session) %>%
  mutate(prop = prop / sum(prop)) %>%
  ungroup()
#> # A tibble: 7 x 6
#>   session obs   cond1 cond2 cond3  prop
#>     <int> <chr> <chr> <int> <int> <dbl>
#> 1       1 A     close    30     0   0.4
#> 2       1 A     open     30     0   0.2
#> 3       1 B     close    30    10   0.2
#> 4       1 C     close    27     2   0.2
#> 5       2 A     close    30     1   0.2
#> 6       2 A     close    30     6   0.6
#> 7       2 B     close    30     2   0.2

Got it! This is why I was being given so many error warnings! That just solved my problem! Thank you! — Sofia, Feb 03 '23 at 06:45

Return the frequency of counts of a condition, leaving other conditions intact

1 Answers1