0

I'm in a learning process to use data.table and trying to recode NA to the non-missing values by b.

library(data.table)
dt <- data.table(a = rep(1:3, 2),
                 b = c(rep(1,3), rep(2, 3)),
                 c = c(NA, 4, NA, 6, NA, NA))

> dt
   a b  c
1: 1 1 NA
2: 2 1  4
3: 3 1 NA
4: 1 2  6
5: 2 2 NA
6: 3 2 NA

I would like to get this:

> dt
   a b  c
1: 1 1  4
2: 2 1  4
3: 3 1  4
4: 1 2  6
5: 2 2  6
6: 3 2  6

I tried these, but none gives the desired result.

dt[, c := ifelse(is.na(c), !is.na(c), c), by = b]
dt[is.na(c), c := dt[!is.na(c), .(c)], by = b]

Appreciate to get some helps and a little bit explanation on how should I consider/think when trying to solve the problem with data.table approach.

yuskam
  • 310
  • 3
  • 8
  • 2
    Will you have the case where there is more than one `c` value for each `b` group? – SymbolixAU Feb 05 '19 at 00:17
  • Not the case I currently have in hand, but I suppose could implement conditioning for the value one would have once the problem I posted is solved – yuskam Feb 05 '19 at 09:50

1 Answers1

1

Assuming a simple case where there is just one c for each level of b:

dt[, c := c[!is.na(c)][1], by = b]
dt
s_baldur
  • 29,441
  • 4
  • 36
  • 69
  • 1
    Should be `!is.na`. Thanks for another suggestion. – yuskam Feb 05 '19 at 09:56
  • @yuskam Thanks for pointing out. Using `[1]` should be faster than `unique()` if the vector is large. – s_baldur Feb 05 '19 at 10:00
  • I realised also it's easier to select desired value if `c` has more than 1. So I selected ur answer since it solves more than one issues. – yuskam Feb 05 '19 at 10:17