1

I have a data frame in R with two columns with logical conditions that looks like this :

check1 = as.logical(c(rep("TRUE",3),rep("FALSE",2),rep("TRUE",3),rep("FALSE",2)))
check2 = as.logical(c(rep("TRUE",5),rep("FALSE",2),rep("TRUE",3)))
dat = cbind(check1,check2)

resulting to :

    check1 check2
 [1,]   TRUE   TRUE
 [2,]   TRUE   TRUE
 [3,]   TRUE   TRUE
 [4,]  FALSE   TRUE
 [5,]  FALSE   TRUE
 [6,]   TRUE  FALSE
 [7,]   TRUE  FALSE
 [8,]   TRUE   TRUE
 [9,]  FALSE   TRUE
[10,]  FALSE   TRUE

I want to roll calculate the percentage of TRUEs on each column which ideally must look like this :

check1 check2
1/1 1/1
2/2 2/2
3/3 3/3
3/4 4/4
3/5 5/5
4/6 5/6
5/7 5/7
6/8 6/8
6/9 7/9
6/10 8/10

maybe ...

dat%>%
  mutate(cumsum(check1)/seq_along(check1))

Any help ?

Homer Jay Simpson
  • 1,043
  • 6
  • 19

2 Answers2

1

You are almost there; just use across to apply your function to both columns.

Alternatively, you can use dplyr::cummean to compute the running proportions.

A note about terminology: rolling usually refers to computing a statistic (such as the mean or the max) within a fixed-size window. On the other hand, cumulative statistics are computed in an ever-increasig window starting from index 1 (or the first row). See the vignette on window functions. Using the right term may help you to search the documentation for the appropriate function.

library("tidyverse")

check1 <- as.logical(c(rep("TRUE", 3), rep("FALSE", 2), rep("TRUE", 3), rep("FALSE", 2)))
check2 <- as.logical(c(rep("TRUE", 5), rep("FALSE", 2), rep("TRUE", 3)))
dat <- cbind(check1, check2)

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), ~ cumsum(.) / row_number())
  )

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), cummean)
  )
cummeans
#> # A tibble: 10 × 2
#>    check1 check2
#>     <dbl>  <dbl>
#>  1  1      1    
#>  2  1      1    
#>  3  1      1    
#>  4  0.75   1    
#>  5  0.6    1    
#>  6  0.667  0.833
#>  7  0.714  0.714
#>  8  0.75   0.75 
#>  9  0.667  0.778
#> 10  0.6    0.8

# Plot the cumulative proportions on the y-axis, with one panel for each check
cummeans %>%
  # The example data has no index column; will use the row ids instead
  rowid_to_column() %>%
  pivot_longer(
    c(check1, check2),
    names_to = "check",
    values_to = "cummean"
  ) %>%
  ggplot(
    aes(rowid, cummean, color = check)
  ) +
  geom_line() +
  # Proportions have a natural range from 0 to 1
  scale_y_continuous(
    limits = c(0, 1)
  )

Created on 2022-03-14 by the reprex package (v2.0.1)

dipetkov
  • 3,380
  • 1
  • 11
  • 19
  • and last thing : how can I ggplot them in the y axis with facets to appears to up right corner ? – Homer Jay Simpson Mar 14 '22 at 11:03
  • I'm not sure whether you'd like the panels to be aligned along the x-axis [in that case use `rows = vars(check)` ] or along the y-axis [in that case use `cols = vars(check)`]. – dipetkov Mar 14 '22 at 11:12
  • I misspoke.i meant in one ggplot with simple legends (I thought that facets are the same sorry about that ) – Homer Jay Simpson Mar 14 '22 at 11:15
  • More terminology, fun. So panels (or facets) are small multiple plots next to each other, with shared axes. Of course, you can plot the running proportions in the same plot instead. – dipetkov Mar 14 '22 at 11:29
1

1) This gives the result as fractions.

library(zoo)

rollapplyr(dat, 1:nrow(dat), mean)
##          check1    check2
##  [1,] 1.0000000 1.0000000
##  [2,] 1.0000000 1.0000000
##  [3,] 1.0000000 1.0000000
##  [4,] 0.7500000 1.0000000
##  [5,] 0.6000000 1.0000000
##  [6,] 0.6666667 0.8333333
##  [7,] 0.7142857 0.7142857
##  [8,] 0.7500000 0.7500000
##  [9,] 0.6666667 0.7777778
## [10,] 0.6000000 0.8000000

1a) To get a percentage multiply that by 100:

100 * rollapplyr(dat, 1:nrow(dat), mean)

2) or using only base R:

apply(dat, 2, cumsum) / row(dat)

2a) or as a percentage

100 * apply(dat, 2, cumsum) / row(dat)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341