Add a column with count of NAs and Mean

Question

I have a data frame and I need to add another column to it which shows the count of NAs in all the other columns for that row and also the mean of the non-NA values. I think it can be done in dplyr.

> df1 <- data.frame(a = 1:5, b = c(1,2,NA,4,NA), c = c(NA,2,3,NA,NA))
> df1
  a  b  c
1 1  1 NA
2 2  2  2
3 3 NA  3
4 4  4 NA
5 5 NA NA

I want to mutate another column which counts the number of NAs in that row and another column which shows the mean of all the NON-NA values in that row.

This generally isn't a forum to ask us to write your code for you. What have you tried? Why do you want to use `dplyr`? FWIW, this can be done in base R quite easily any number of ways. One is: `df1$na <- apply(is.na(df1), 1, sum)` — Justin, Feb 16 '16 at 21:30
The `dplyr` way is described here: http://stackoverflow.com/questions/21818181/applying-a-function-to-every-row-of-a-table-using-dplyr — Stephen Henderson, Feb 16 '16 at 21:42

maloneypatr · Accepted Answer · 2020-10-02T14:37:20.463

23

library(dplyr)

count_na <- function(x) sum(is.na(x))    

df1 %>%
  mutate(means = rowMeans(., na.rm = T),
         count_na = apply(., 1, count_na))

#### ANSWER FOR RADEK ####
elected_cols <- c('b', 'c')

df1 %>%
  mutate(means = rowMeans(.[elected_cols], na.rm = T),
         count_na = apply(.[elected_cols], 1, count_na))

edited Oct 02 '20 at 14:37

answered Feb 16 '16 at 22:27

maloneypatr

3,562
4
23
33

How would you modify this solution to work only on elected columns? For instance b & c? – radek Nov 21 '17 at 07:31
@radek see my answer on this page – Jerry T Nov 22 '17 at 07:29
1

@radek - I updated the solution to answer your question. – maloneypatr Oct 02 '20 at 14:38

Jerry T · Answer 2 · 2017-11-22T07:28:37.343

13

As mentioned here https://stackoverflow.com/a/37732069/2292993

df1 <- data.frame(a = 1:5, b = c(1,2,NA,4,NA), c = c(NA,2,3,NA,NA))

df1 %>%
  mutate(means = rowMeans(., na.rm = T),
         count_na = rowSums(is.na(.)))

to work on selected cols (the example here is for col a and col c):

df1 %>%
  mutate(means = rowMeans(., na.rm = T),
       count_na = rowSums(is.na(select(.,one_of(c('a','c'))))))

edited Nov 22 '17 at 07:28

answered Nov 22 '17 at 07:16

Jerry T

1,541
1
19
17

score 8 · Answer 3 · answered Feb 16 '16 at 21:44

8

You can try this:

#Find the row mean and add it to a new column in the dataframe
df1$Mean <- rowMeans(df1, na.rm = TRUE)

#Find the count of NA and add it to a new column in the dataframe
df1$CountNa <- rowSums(apply(is.na(df1), 2, as.numeric))

answered Feb 16 '16 at 21:44

windrunn3r.1990

414
2
6

score 1 · Answer 4 · answered Sep 19 '21 at 09:30

I recently faced a variation on this question where I needed to compute the percent of complete values, but for specific variables (not all variables). Here is an approach that worked for me.

df1 %>% 
  # create dummy variables representing if the observation is missing ----
  # can modify here for specific variables ----
  mutate_all(list(dummy = is.na)) %>% 
  # compute a row wise sum of missing ----
  rowwise() %>% 
  mutate(
    # number of missing observations ----
    n_miss = sum(c_across(matches("_dummy"))),
    # percent of observations that are complete (non-missing) ----
    pct_complete = 1 - mean(c_across(matches("_dummy")))
  ) %>% 
  # remove grouping from rowwise ---- 
  ungroup() %>% 
  # remove dummy variables ----
  dplyr::select(-matches("dummy"))

Add a column with count of NAs and Mean

4 Answers4

Linked

Related