0

I would like to calculate means and st.devs of a column in table but I would like to calculate them for each new observation ex

library(tidyverse)

aa <- data.frame(aa = c(2, 3, 4, 5, 6, 7, 8)) %>%
  mutate(aa1 = cumsum(aa), li = 1:n()) %>%
  mutate(MeanAA = aa1/li)


aa = c(2, 3, 4, 5, 6, 7, 8)

mean(aa[1:2])
mean(aa[1:3])

sd(aa[1:2])
sd(aa[1:3])

I could do it for a mean but not for SD. I would like to see how sd is changing in relation to mean with increasing number of observations.

Mateusz1981
  • 1,817
  • 17
  • 33

1 Answers1

1

How about this:

aa <- c(2, 3, 4, 5, 6, 7, 8)

for (i in 2:length(aa)) {
  mn <- mean(aa[1:i])
  ss <- sd(aa[1:i])
  cat(sprintf("1-%i\tMean: %.2f\tSD: %.2f\n", i, mn, ss))
}
#> 1-2  Mean: 2.50  SD: 0.71
#> 1-3  Mean: 3.00  SD: 1.00
#> 1-4  Mean: 3.50  SD: 1.29
#> 1-5  Mean: 4.00  SD: 1.58
#> 1-6  Mean: 4.50  SD: 1.87
#> 1-7  Mean: 5.00  SD: 2.16

Created on 2022-06-01 by the reprex package (v2.0.1)

If you need the values in a data.frame, you can use it like so

library(tidyverse)
tibble(aa = c(2, 3, 4, 5, 6, 7, 8)) %>%
  mutate(
    running_mean = sapply(seq(n()), function(i) mean(aa[seq(i)])),
    running_sd = sapply(seq(n()), function(i) sd(aa[seq(i)])),
  )
#> # A tibble: 7 x 3
#>      aa running_mean running_sd
#>   <dbl>        <dbl>      <dbl>
#> 1     2          2       NA    
#> 2     3          2.5      0.707
#> 3     4          3        1    
#> 4     5          3.5      1.29 
#> 5     6          4        1.58 
#> 6     7          4.5      1.87 
#> 7     8          5        2.16

Created on 2022-06-01 by the reprex package (v2.0.1)

David
  • 9,216
  • 4
  • 45
  • 78
  • 1
    Sorry I updated the question, Can one use this in the table? – Mateusz1981 Jun 01 '22 at 08:14
  • can one randomize observations in variable 'aa' and calculate several means (or the mean of several randomizations)? – Mateusz1981 Aug 02 '22 at 09:46
  • at the moment we take `seq(i)` (eg for i = 3 it would be 1, 2, 3) but you can also sample/~bootstrap the observations by replacing for example the seq(i) with `sample.int(N, i, replace = TRUE)` with N for the number of values you want to sample from the vector – David Aug 02 '22 at 10:36
  • Thanks @David. It is partly what I would like to achieve. Will try to define me better. I would like to randomise all my observation for example 100 times. For each run I would like to calculate a cumulative mean and sd. At the end I would like to get a mean cumulative value for these 100 runs – Mateusz1981 Aug 03 '22 at 05:25
  • In this case you would have a (for example) lapply loop over your dataset and repeat the sampling N times to use the solution above. But this is getting a bit more than what I can write in a comment. Feel free to post a new question and link it here – David Aug 03 '22 at 11:12
  • https://stackoverflow.com/questions/73387492/randomization-of-vector-and-average-mean @David – Mateusz1981 Aug 17 '22 at 11:14