3
data.frame(
  group = c("a", "b", "c", "d", "e", "total"),
  count = c(NA, NA, 10, 21, 49, 85)
)
> 
  group count
1     a    NA
2     b    NA
3     c    10
4     d    21
5     e    49
6   total  85

Given the above data frame, how can I impute the NA values, so that

  1. the totals of a-e match total
  2. each imputed NA is <10?

A solution could either be generating a nested data frame of all possibilities, or replace NA with a distribution or sth... Thanks!

electronix384128
  • 6,625
  • 11
  • 45
  • 67

1 Answers1

4

One way would be to use RcppAlgos::permuteGeneral() to generate all permutations that sum to the target. From there, a set can be selected at random to replace the NAs.

library(RcppAlgos)

# Count NAs 
n <- sum(is.na(dat$count))

# Find sum target
target <- dat$count[dat$group == "total"] - sum(dat$count[dat$group != "total"], na.rm = TRUE)

# Generate permutations of n values that sum to target
res <- permuteGeneral(
  0:min(9, target),  # Ensure all values are less than 10
  n,
  repetition = TRUE,
  constraintFun = "sum",
  comparisonFun = "==",
  limitConstraints = target
  )

# Permutations that meet the constraints:
res
     [,1] [,2]
[1,]    0    5
[2,]    5    0
[3,]    1    4
[4,]    4    1
[5,]    2    3
[6,]    3    2

# Replace NA values with random permutation
dat$count[is.na(dat$count)] <- res[sample(nrow(res), 1), ]

dat
  group count
1     a     3
2     b     2
3     c    10
4     d    21
5     e    49
6 total    85
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • 2
    Nice!! `RcppAlgos` author here. As of `2.6.0`, we can avoid generating every [composition](https://en.wikipedia.org/wiki/Composition_(combinatorics)) by utilizing `compositionsSample`. E.g. `compositionsSample(0:min(9, target), n, TRUE, target = target, n = 1)`. – Joseph Wood May 08 '23 at 15:39
  • 1
    @JosephWood - first of all, `RcppAlgos` is a great addition to the R ecosystem, thanks! I did look at the compositions functions after another user suggested the same. It doesn't in this example, however I assume the target value could exceed the restricted maximum so it seems `compositionsSample()` would not work as a general solution in this case. – Ritchie Sacramento May 09 '23 at 01:08