1

I wonder if there is a way to fill with random numbers each individual missing value when using dcast (library reshape2 or data.table). Example:

ID = c('AA', 'AA', 'BB', 'BB', 'CC', 'CC', 'CC', 'DD', 'DD')
Replica = c('H1','H3','H1','H2','H1','H2','H3','H2','H3')
Value = c(1.3, 2.5, 1.4, 3.7, 9.5, 7.4, 7.1, 1.8, 8.4)

example <- data.frame(ID=ID, Replica = Replica, Value = Value)

Doing a simple dcast

dfdc <- dcast(data=example, ID~Replica, value.var = 'Value', fill = sample(1:10, 1))

notice how some of the values are missed:

  ID  H1  H2  H3
1 AA 1.3  NA 2.5
2 BB 1.4 3.7  NA
3 CC 9.5 7.4 7.1
4 DD  NA 1.8 8.4

I would like to fill up each of those missing values with random numbers, something like:

dfdc <- dcast(data=example, ID~Replica, value.var = 'Value', fill = sample(1:10, 1))

which gives as a result:

  ID  H1  H2  H3
1 AA 1.3 2.0 2.5
2 BB 1.4 3.7 2.0
3 CC 9.5 7.4 7.1
4 DD 2.0 1.8 8.4

However, all the missing values have been replaced by the same random number (2 in this case).

Would it be possible to apply the function individually to each missing value and, therefore, fill the missing values with different random numbers?

Thanks in advance!

David JM
  • 351
  • 1
  • 3
  • 11

2 Answers2

3

If you're not concerned with a warning, you could just do fill = sample(10), and the unused values will be dropped. You will still receive three random numbers. Just make sure you're certain the sample is higher than the expected number of NA values.

dcast(example, ID ~ Replica, fill = sample(10))
#   ID   H1  H2  H3
# 1 AA  1.3 4.0 2.5
# 2 BB  1.4 3.7 1.0
# 3 CC  9.5 7.4 7.1
# 4 DD 10.0 1.8 8.4
# Warning message:
# In ordered[is.na(ordered)] <- fill :
#   number of items to replace is not a multiple of replacement length

Of course, you could simply wrap that with suppressWarnings() as well.

suppressWarnings(dcast(example, ID ~ Replica, fill = sample(10)))
#   ID  H1  H2  H3
# 1 AA 1.3 6.0 2.5
# 2 BB 1.4 3.7 5.0
# 3 CC 9.5 7.4 7.1
# 4 DD 9.0 1.8 8.4
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
3

Here is an option using tidyverse

library(tidyverse)
complete(example, ID, Replica) %>%
    mutate(Value = coalesce(Value, as.numeric(sample(1:10, n(), replace=TRUE))))  %>%       
    spread(Replica, Value)
# A tibble: 4 × 4
#      ID    H1    H2    H3
#* <fctr> <dbl> <dbl> <dbl>
#1     AA   1.3   2.0   2.5
#2     BB   1.4   3.7   1.0
#3     CC   9.5   7.4   7.1
#4     DD   8.0   1.8   8.4
akrun
  • 874,273
  • 37
  • 540
  • 662