22

I have a dataframe with some NA values:

dfa <- data.frame(a=c(1,NA,3,4,5,NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfa

I would like to replace the NAs with values in the same position in another dataframe:

dfrepair <- data.frame(a=c(2:7),b=c(6:1),c=c(8:3))
dfrepair

I tried:

dfa1 <- dfa

dfa1 <- ifelse(dfa == NA, dfrepair, dfa)
dfa1

but this did not work.

Ferdi
  • 540
  • 3
  • 12
  • 23
adam.888
  • 7,686
  • 17
  • 70
  • 105

5 Answers5

21

You can do:

dfa <- data.frame(a=c(1,NA,3,4,5,NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfrepair <- data.frame(a=c(2:7),b=c(6:1),c=c(8:3))
dfa[is.na(dfa)] <- dfrepair[is.na(dfa)]
dfa

  a b c
1 1 1 7
2 3 5 7
3 3 4 6
4 4 3 5
5 5 8 2
6 7 9 3
gfgm
  • 3,627
  • 14
  • 34
  • 7
    If creating a new table instead of modifying dfa: `replace(dfa, is.na(dfa), dfrepair[is.na(dfa)])` – Frank Dec 16 '16 at 00:15
  • In case the `data.frame` has different types, this way might change the type of the columns. Try it with `dfrepair <- data.frame(a=letters[2:7],b=c(6:1),c=c(8:3))` what will change *all* columns of `dfa` from `numeric` to `character`. – GKi Jan 19 '23 at 09:18
8

In the tidyverse, you can use purrr::map2_df, which is a strictly bivariate version of mapply that simplifies to a data.frame, and dplyr::coalesce, which replaces NA values in its first argument with the corresponding ones in the second.

library(tidyverse)

dfrepair %>% 
    mutate_all(as.numeric) %>%    # coalesce is strict about types
    map2_df(dfa, ., coalesce)

## # A tibble: 6 × 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     1     7
## 2     3     5     7
## 3     3     4     6
## 4     4     3     5
## 5     5     8     2
## 6     7     9     3
alistaire
  • 42,459
  • 4
  • 77
  • 117
8

We can use Map from base R to do a columnwise comparison between the two datasets

dfa[] <- Map(function(x,y) {x[is.na(x)] <- y[is.na(x)]; x}, dfa, dfrepair)
dfa
#  a b c
#1 1 1 7
#2 3 5 7
#3 3 4 6
#4 4 3 5
#5 5 8 2
#6 7 9 3
akrun
  • 874,273
  • 37
  • 540
  • 662
2
dfa <- data.frame(a=c(1,NA,3,4,5,NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfa
dfrepair <- data.frame(a=c(2:7),b=c(6:1),c=c(8:3))
dfrepair 
library(dplyr)
coalesce(as.numeric(dfa), as.numeric(dfrepair))

  a b c
1 1 1 7
2 3 5 7
3 3 4 6
4 4 3 5
5 5 8 2
6 7 9 3

As the code in dplyr is written in C++ it is faster in most cases. An other important advantage is that coalesce as well as many other dplyr functions are the same in SQL. Using dplyr you learn SQL by coding in R. ;-)

Ferdi
  • 540
  • 3
  • 12
  • 23
  • 1
    The `as.numeric`s error out for me, though `coalesce(dfa, dfrepair)` surprisingly _does_ work (the docs only talk about vectors, not whole data.frames) and doesn't complain about the different types. – alistaire Dec 19 '16 at 18:56
  • Thank you for the as.numeric. ;-) A data.frame is a list of list and vectors. In the case above as well as in many other cases it is just a list of vectors containing numbers. Therefore in our case it is very similar to a matrix. If the data.frame contains recursive lists my code will not run. – Ferdi Dec 20 '16 at 10:45
  • A data.frame may look like a matrix, but the underlying structure is very different. A data.frame is a list of one-dimensional vectors, whereas a matrix is a single vector with two dimensions. Most functions designed for vectors don't work on data.frames; the subsetting in `coalesce` must be fortuitous. – alistaire Dec 20 '16 at 10:56
  • `A <- matrix(1:9, nrow = 3) is.vector(A)` renders `FALSE`. According to this test a matrix is not a single vector in R. – Ferdi Dec 21 '16 at 08:25
  • From [the language definition](https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Attributes): "Matrices and arrays are simply vectors with the attribute dim and optionally dimnames attached to the vector." E.g. `x <- diag(3); attributes(x) <- NULL; x` Lists are vectors too (generic, not atomic), e.g. `is.vector(list(1))`, making the word confusing. `is.vector` just checks for attributes other than names, which is why `is.vector(iris)` returns `FALSE` even though `typeof(iris)` returns `"list"`. A better check: `purrr::is_vector(diag(3)); purrr::is_atomic(diag(3))`: `TRUE`; `TRUE`. – alistaire Dec 21 '16 at 15:04
0

In case there are different types the replacement should be done columnwise. Another simple way allowing in place exchange might be.

for(i in seq_along(dfa)) {
    . <- is.na(dfa[[i]])
    dfa[[i]][.] <- dfrepair[[i]][.]
}

Or using in addition which which might improve speed / memory usage in some cases.

for(i in seq_along(dfa)) {
    . <- which(is.na(dfa[[i]]))
    dfa[[i]][.] <- dfrepair[[i]][.]
}

Benchmark of columnwise base options.

dfa <- data.frame(a=c("A",NA,"B","C","D",NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfrepair <- data.frame(a=letters[2:7],b=c(6:1),c=c(8:3))

bench::mark(
akrun = local({dfa[] <- Map(function(x,y) {x[is.na(x)] <- y[is.na(x)]; x}, dfa, dfrepair); dfa}),
GKi = local({for(i in seq_along(dfa)) {. <- is.na(dfa[[i]])
                 dfa[[i]][.] <- dfrepair[[i]][.]}
                 dfa})
)
#  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 akrun        64.4µs 70.3µs    12895.      280B     26.7  5793    12      449ms
#2 GKi          54.8µs   60µs    16347.      280B     28.7  7395    13      452ms
GKi
  • 37,245
  • 2
  • 26
  • 48