I'm an education researcher trying to simulate a dataset of test scores. If I know
- the score for a test has min of 400 and max of 800
- scores must be whole numbers
- different subgroups have a different mean and sd values for historical values
What is the best way to simulate a column of those values? Assume I have something like the following:
ID | Race | Group_Mean_Score | Group_SD_Score | Sim_Score |
---|---|---|---|---|
1 | B | 600 | 37.5 | ? |
2 | B | 600 | 37.5 | ? |
3 | A | 630 | 24.3 | ? |
Now... what if the individual scores can only be in increments of 10?
My initial thought was to go to rnorm()
(yes, assuming normally distributed scores), but I want to force it to give integers, if possible in increments of 10, and to simulate given different distributional properties by group. That's beyond me right now.
Any help is appreciated.