Mean and Standard Deviation of x>=5 of 10000 data points binomial(10, 1/4)

Question

I have a data range of 10,000 points as per:

data = rbinom(10000, size=10, prob=1/4)

I need to find the mean and standard deviation of the data values >=5.

There are approx 766 values as per:

sum(data >=5)

sum (or any other approach I can think of) produces a TRUE/FALSE and cannot be used within a mean or sd calculation. How do I divide up the actual values?!

Allan Cameron · Accepted Answer · 2022-05-14T11:27:55.797

2

If you want to get all the values of data which are greater than or equal to 5, rather than just a logical vector telling you if the values of data are greater than or equal to 5, you need to do data[data >= 5].

So we can do:

data = rbinom(10000, size=10, prob=1/4)

mean(data[data >= 5])
#> [1] 5.298153

sd(data[data >= 5])
#> [1] 0.5567141

edited May 14 '22 at 11:27

answered May 14 '22 at 11:22

Allan Cameron

147,086
7
49
87

score 0 · Answer 2 · answered May 14 '22 at 11:14

0

Maybe try this:

library(dplyr)
data %>%
  as.data.frame() %>%
  filter(. >= 5) %>%
  summarise(mean = mean(.),
            sd = sd(.))

Output:

      mean        sd
1 5.297092 0.5815554

Data

data = rbinom(10000, size=10, prob=1/4)

answered May 14 '22 at 11:14

Quinten

35,235
5
20
53

DaveArmstrong · Answer 3 · 2022-05-14T11:34:04.933

0

The TRUE and FALSE values can be used in mean(), sum(), sd(), etc... as they have numerical values 0 and 1, respectively.

set.seed(456)
data = rbinom(10000, size=10, prob=1/4)
mean(data >= 5)
#> [1] 0.0779
sum(data >= 5)
#> [1] 779
sd(data >= 5)
#> [1] 0.2680276

^{Created on 2022-05-14 by the reprex package (v2.0.1)}

edited May 14 '22 at 11:34

answered May 14 '22 at 11:20

DaveArmstrong

18,377
2
13
25

I read the question as getting the mean and sd of all the values in data that are greater than or equal to 5 – Allan Cameron May 14 '22 at 11:23
@AllanCameron sorry, I read it as wanting the `mean(data >= 5)` rather than `mean(data[data >= 5])`. I'll leave the answer here for now, but re-reading the question, I suspect you're right. – DaveArmstrong May 14 '22 at 11:34

Mean and Standard Deviation of x>=5 of 10000 data points binomial(10, 1/4)

3 Answers3

Data