1

I'm an education researcher trying to simulate a dataset of test scores. If I know

  1. the score for a test has min of 400 and max of 800
  2. scores must be whole numbers
  3. different subgroups have a different mean and sd values for historical values

What is the best way to simulate a column of those values? Assume I have something like the following:

ID Race Group_Mean_Score Group_SD_Score Sim_Score
1 B 600 37.5 ?
2 B 600 37.5 ?
3 A 630 24.3 ?

Now... what if the individual scores can only be in increments of 10?

My initial thought was to go to rnorm() (yes, assuming normally distributed scores), but I want to force it to give integers, if possible in increments of 10, and to simulate given different distributional properties by group. That's beyond me right now.

Any help is appreciated.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214

1 Answers1

1

For a distribution with a mean of 100, standard deviation of 10 for 100 students, you could use the following:

mu <- 100
sigma <- 10
N <- 100
sims <- round(rnorm(N, mu, sigma),1) * 100

Given the example above, you can do all of this inside of your data:

library(dplyr)
df <- tribble(
    ~ID, ~Race, ~Group_Mean_Score, ~Group_SD_Score,
    1, "B", 600, 37.5,
    2, "B", 600, 37.5,
    3, "A", 630, 24.3
)
df %>%
    mutate(Sim_Score = rnorm(nrow(.), Group_Mean_Score, Group_SD_Score))

# A tibble: 3 x 5
     ID Race  Group_Mean_Score Group_SD_Score Sim_Score
  <dbl> <chr>            <dbl>          <dbl>     <dbl>
1     1 B                  600           37.5      571.
2     2 B                  600           37.5      638.
3     3 A                  630           24.3      654.
mikebader
  • 1,075
  • 3
  • 12
  • Ah. So I can assign mu and sigma by subgroup. In your example, does ```rnorm() ``` know to pull from the mu and sigma objects you defined? I don't see them after you defined them. Also, do you have any suggestions for forcing the limits of the output? – Jeffrey Harding Aug 24 '21 at 12:46
  • Doh! I just fixed it. And, yes, you an define mu and sigma by subgroup. I've added an example to show how you can do it inside your existing data. – mikebader Aug 24 '21 at 12:55