I'm going to assume a runif
jittering, though this method can work with other functions/parameters as well (with modification).
Create a frame with the grouping variable(s) and the parameters for jittering, such as
jitters <- data.frame(
cyl = c(4L, 6L, 8L),
neg = c(0, -0.3, -0.1),
pos = c(0.5, 0, 0.1)
)
jitters
# cyl neg pos
# 1 4 0.0 0.5
# 2 6 -0.3 0.0
# 3 8 -0.1 0.1
From here, we can runif
across all variables at once with a simple:
set.seed(42)
out <- merge(mtcars, jitters, by = "cyl", all.x = TRUE) |>
transform(mpg2 = mpg + runif(nrow(mtcars), neg, pos))
out
# cyl mpg disp hp drat wt qsec vs am gear carb neg pos mpg2
# 1 4 22.8 140.8 95 3.92 3.150 22.90 1 0 4 2 0.0 0.5 23.25740
# 2 4 22.8 108.0 93 3.85 2.320 18.61 1 1 4 1 0.0 0.5 23.26854
# 3 4 24.4 146.7 62 3.69 3.190 20.00 1 0 4 2 0.0 0.5 24.54307
# 4 4 21.5 120.1 97 3.70 2.465 20.01 1 0 3 1 0.0 0.5 21.91522
# 5 4 30.4 75.7 52 4.93 1.615 18.52 1 1 4 2 0.0 0.5 30.72087
# 6 4 33.9 71.1 65 4.22 1.835 19.90 1 1 4 1 0.0 0.5 34.15955
# 7 4 26.0 120.3 91 4.43 2.140 16.70 0 1 5 2 0.0 0.5 26.36829
# 8 4 30.4 95.1 113 3.77 1.513 16.90 1 1 5 2 0.0 0.5 30.46733
# 9 4 32.4 78.7 66 4.08 2.200 19.47 1 1 4 1 0.0 0.5 32.72850
# 10 4 21.4 121.0 109 4.11 2.780 18.60 1 1 4 2 0.0 0.5 21.75253
# 11 4 27.3 79.0 66 4.08 1.935 18.90 1 1 4 1 0.0 0.5 27.52887
# 12 6 21.0 160.0 110 3.90 2.620 16.46 0 1 4 4 -0.3 0.0 20.91573
# 13 6 21.0 160.0 110 3.90 2.875 17.02 0 1 4 4 -0.3 0.0 20.98040
# 14 6 17.8 167.6 123 3.92 3.440 18.90 1 0 4 4 -0.3 0.0 17.57663
# 15 6 21.4 258.0 110 3.08 3.215 19.44 1 0 3 1 -0.3 0.0 21.23869
# 16 6 18.1 225.0 105 2.76 3.460 20.22 1 0 3 1 -0.3 0.0 18.08200
# 17 6 19.2 167.6 123 3.92 3.440 18.30 1 0 4 4 -0.3 0.0 19.19347
# 18 6 19.7 145.0 175 3.62 2.770 15.50 0 1 5 6 -0.3 0.0 19.43525
# 19 8 18.7 360.0 175 3.15 3.440 17.02 0 0 3 2 -0.1 0.1 18.69500
# 20 8 17.3 275.8 180 3.07 3.730 17.60 0 0 3 3 -0.1 0.1 17.31207
# 21 8 14.3 360.0 245 3.21 3.570 15.84 0 0 3 4 -0.1 0.1 14.38081
# 22 8 14.7 440.0 230 3.23 5.345 17.42 0 0 3 4 -0.1 0.1 14.62774
# 23 8 10.4 472.0 205 2.93 5.250 17.98 0 0 3 4 -0.1 0.1 10.49778
# 24 8 16.4 275.8 180 3.07 4.070 17.40 0 0 3 3 -0.1 0.1 16.48933
# 25 8 19.2 400.0 175 3.08 3.845 17.05 0 0 3 2 -0.1 0.1 19.11649
# 26 8 15.2 275.8 180 3.07 3.780 18.00 0 0 3 3 -0.1 0.1 15.20284
# 27 8 15.2 304.0 150 3.15 3.435 17.30 0 0 3 2 -0.1 0.1 15.17804
# 28 8 10.4 460.0 215 3.00 5.424 17.82 0 0 3 4 -0.1 0.1 10.48115
# 29 8 15.8 351.0 264 4.22 3.170 14.50 0 1 5 4 -0.1 0.1 15.78939
# 30 8 15.5 318.0 150 2.76 3.520 16.87 0 0 3 2 -0.1 0.1 15.56720
# 31 8 15.0 301.0 335 3.54 3.570 14.60 0 1 5 8 -0.1 0.1 15.04752
# 32 8 13.3 350.0 245 3.73 3.840 15.41 0 0 3 4 -0.1 0.1 13.36221
If you need a different function and that function doesn't vectorize the parameters like this, you may need to Vectorize(func)(nrow(mtcars), neg, pos)
or similar, though that will slow things down (with large data).
We can "verify" (within reason) the jittering with:
aggregate(cbind(mpg, mpg2) ~ cyl, data = out,
FUN = function(z) c(min = min(z), max = max(z)))
# cyl mpg.min mpg.max mpg2.min mpg2.max
# 1 4 21.4 33.9 21.61789 34.00383
# 2 6 17.8 21.4 17.78727 21.36633
# 3 8 10.4 19.2 10.30779 19.23546
where with cyl == 4
, due to the range of -0, +0.5
we expect mpg.min <= mpg2.min
; with cyl == 6
and a range of -0.3, +0
we expect mpg.max >= mpg2.max
; etc.