I have a data set with mean values, standard deviations and n. One of the variables has an equal sample size, while the sample size for the other one varies.
dat <- data.frame(variable = c(rep("x", 2), rep("y", 3)), replicate = c(1,2,1,2,3),
mean = c(3.4, 2.5, 6.5, 5.7, 5.1), sd = c(1.2, 0.7, 2.4, 4.0, 3.5),
n = c(3,3,5,4,6))
I need to combine x
and y
variables and am trying to find a code-sparing way to calculate combined standard deviation for instance using by aggregate
function. The equation for combined standard deviation is following:
And for unequal sample sizes (same source):
My combined data frame should look like this:
variable mean sd
x 2.95 sd_x
y 5.76 sd_y
How to make a function in R that calculates the combined standard deviation? Or alternatively, if there is a package designed for this, it counts as an answer too =)