How impute NA values or create all possible combinations?

Question

data.frame(
  group = c("a", "b", "c", "d", "e", "total"),
  count = c(NA, NA, 10, 21, 49, 85)
)
> 
  group count
1     a    NA
2     b    NA
3     c    10
4     d    21
5     e    49
6   total  85

Given the above data frame, how can I impute the NA values, so that

the totals of a-e match total
each imputed NA is <10?

A solution could either be generating a nested data frame of all possibilities, or replace NA with a distribution or sth... Thanks!

Ritchie Sacramento · Accepted Answer · 2023-05-06T07:08:55.810

4

One way would be to use RcppAlgos::permuteGeneral() to generate all permutations that sum to the target. From there, a set can be selected at random to replace the NAs.

library(RcppAlgos)

# Count NAs 
n <- sum(is.na(dat$count))

# Find sum target
target <- dat$count[dat$group == "total"] - sum(dat$count[dat$group != "total"], na.rm = TRUE)

# Generate permutations of n values that sum to target
res <- permuteGeneral(
  0:min(9, target),  # Ensure all values are less than 10
  n,
  repetition = TRUE,
  constraintFun = "sum",
  comparisonFun = "==",
  limitConstraints = target
  )

# Permutations that meet the constraints:
res
     [,1] [,2]
[1,]    0    5
[2,]    5    0
[3,]    1    4
[4,]    4    1
[5,]    2    3
[6,]    3    2

# Replace NA values with random permutation
dat$count[is.na(dat$count)] <- res[sample(nrow(res), 1), ]

dat
  group count
1     a     3
2     b     2
3     c    10
4     d    21
5     e    49
6 total    85

edited May 06 '23 at 07:08

answered May 06 '23 at 06:57

Ritchie Sacramento

29,890
4
48
56

2

Nice!! `RcppAlgos` author here. As of `2.6.0`, we can avoid generating every [composition](https://en.wikipedia.org/wiki/Composition_(combinatorics)) by utilizing `compositionsSample`. E.g. `compositionsSample(0:min(9, target), n, TRUE, target = target, n = 1)`. – Joseph Wood May 08 '23 at 15:39
1

@JosephWood - first of all, `RcppAlgos` is a great addition to the R ecosystem, thanks! I did look at the compositions functions after another user suggested the same. It doesn't in this example, however I assume the target value could exceed the restricted maximum so it seems `compositionsSample()` would not work as a general solution in this case. – Ritchie Sacramento May 09 '23 at 01:08

How impute NA values or create all possible combinations?

1 Answers1