10

I want to insert a new column into a data.frame, which value is TRUE when there is at least one missing value in the row and FALSE otherwise.

For that problem, apply is a a perfect use case:

EDIT - added example

tab <- data.frame(a = 1:10, b = c(NA, letters[2:10]), c = c(LETTERS[1:9], NA))

tab$missing <- apply(tab, 1, function(x) any(is.na(x)))

However, I loaded the strict package, and got this error: apply() coerces X to a matrix so is dangerous to use with data frames.Please use lapply() instead.

I know that I can safely ignore this error, however, I was wondering if there was a way to code it using one of the tidyverse packages, in a simple manner. I tried unsuccessfully with dplyr:

tab %>% 
  rowwise() %>% 
  mutate(missing = any(is.na(.), na.rm = TRUE))
Kevin Zarca
  • 2,572
  • 1
  • 18
  • 18
  • 2
    Have you tried using `purrr::by_row()`? – Hanjo Odendaal Jul 06 '17 at 14:51
  • 3
    @HanjoJo'burgOdendaal That was sadly deprecated to [purrrlyr](https://github.com/hadley/purrrlyr/) – alistaire Jul 06 '17 at 14:52
  • 3
    btw, you could also avoid the nasty `apply` with `Margin = 1` method and do something Vectorized in the lines of `rowSums(is.na(tab)) > 0` – Sotos Jul 06 '17 at 14:54
  • 2
    Try it this way: `apply(is.na(tab), 1, any)` or `vapply(split(tab, 1:nrow(tab)), f, logical(1))` where `f` is the anonymous function in the question. – G. Grothendieck Jul 06 '17 at 15:05
  • I think the title of this Question is misleading. The title should be along the lines of "How to add a column to a data.frame with Tidyverse". There have been no answers to what is the Tidyverse equivalent to the apply "by row" function. Suggested edit queue is currently full – Kasper Thystrup Karstensen Feb 08 '22 at 10:06
  • @HanjoOdendaal Provides actually provides one answer to the title of this question, allthough by_row is now deprecated – Kasper Thystrup Karstensen Feb 08 '22 at 10:08

3 Answers3

8

If you want to avoid coercing to a matrix, you can use purrr::pmap, which iterates across the elements of a list in parallel and passes them to a function:

library(tidyverse)

tab <- data_frame(a = 1:10, 
                  b = c(NA, letters[2:10]), 
                  c = c(LETTERS[1:9], NA))

tab %>% mutate(missing = pmap_lgl(., ~any(is.na(c(...)))))
#> # A tibble: 10 x 4
#>        a     b     c missing
#>    <int> <chr> <chr>   <lgl>
#>  1     1  <NA>     A    TRUE
#>  2     2     b     B   FALSE
#>  3     3     c     C   FALSE
#>  4     4     d     D   FALSE
#>  5     5     e     E   FALSE
#>  6     6     f     F   FALSE
#>  7     7     g     G   FALSE
#>  8     8     h     H   FALSE
#>  9     9     i     I   FALSE
#> 10    10     j  <NA>    TRUE

In the function, c is necessary to pull all the parameters passed to the function ... into a vector, which can be passed to is.na and collapsed with any. The *_lgl suffixed pmap simplifies the result to a Boolean vector.

Note that while this avoids coercing to matrix, it will not necessarily be faster than approaches that do, as matrix operations are highly optimized in R. It may make more sense to explicitly coerce to a matrix, e.g.

tab %>% mutate(missing = rowSums(is.na(as.matrix(.))) > 0)

which returns the same thing.

alistaire
  • 42,459
  • 4
  • 77
  • 117
1

This works for the example data:

library(tidyverse)

tab <- data_frame(a = 1:10, 
                  b = c(NA, letters[2:10]), 
                  c = c(LETTERS[1:9], NA))

tab_1 <- tab %>% mutate(missing = ifelse(is.na(b), TRUE, ifelse(is.na(c), TRUE, FALSE)))

> tab_1
    a    b    c missing
1   1 <NA>    A    TRUE
2   2    b    B   FALSE
3   3    c    C   FALSE
4   4    d    D   FALSE
5   5    e    E   FALSE
6   6    f    F   FALSE
7   7    g    G   FALSE
8   8    h    H   FALSE
9   9    i    I   FALSE
10 10    j <NA>    TRUE
Rory Shaw
  • 811
  • 2
  • 9
  • 26
1

You can use the complete.cases function:

tab %>% mutate(missing = !complete.cases(.))

To remove rows with one or more NAs, use:

tab %>% filter(complete.cases(.))
wint3rschlaefer
  • 198
  • 1
  • 4