209

I want to count the number of NA values in a data frame column. Say my data frame is called df, and the name of the column I am considering is col. The way I have come up with is following:

sapply(df$col, function(x) sum(length(which(is.na(x)))))  

Is this a good/most efficient way to do this?

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
user3274289
  • 2,426
  • 3
  • 16
  • 14

17 Answers17

430

You're over-thinking the problem:

sum(is.na(df$col))
rrs
  • 9,615
  • 4
  • 28
  • 38
  • Thank you for this. To expand this a little bit. In counting amount of arbitrary `value`, other than `NA` is writing a boolean function `is.value` and then using `sum(is.value(df$col))` the way to go or is there a more concise direct syntax for this? – user3274289 Jun 04 '14 at 02:11
  • 3
    Was too quick to ask. `sum(df$col==value,na.rm=FALSE)` does the trick. – user3274289 Jun 04 '14 at 02:18
  • 4
    @user3274289: although you'll usually want `na.rm=TRUE`, because otherwise if `df$col` contains NAs, `sum` will return `NA`. – jbaums Jun 04 '14 at 02:30
  • 1
    Sometimes I think I am over-thinking, till I got this answer...well, it's true... – Rugal Feb 29 '16 at 04:40
  • sorry but this does not work for me. I get this warning Warning message: In is.na(nom$wd) : is.na() applied to non-(list or vector) of type 'NULL', and the count is just zero. – Herman Toothrot Mar 16 '16 at 17:35
  • you can also count non-NA values with `sum(!is.na(df$col))` – mschmidt Nov 24 '22 at 07:31
101

If you are looking for NA counts for each column in a dataframe then:

na_count <-sapply(x, function(y) sum(length(which(is.na(y)))))

should give you a list with the counts for each column.

na_count <- data.frame(na_count)

Should output the data nicely in a dataframe like:

----------------------
| row.names | na_count
------------------------
| column_1  | count
Kevin Ogoro
  • 1,107
  • 1
  • 9
  • 8
73

Try the colSums function

df <- data.frame(x = c(1,2,NA), y = rep(NA, 3))

colSums(is.na(df))

#x y 
#1 3 
Tony Ladson
  • 3,539
  • 1
  • 23
  • 30
  • 4
    If you are dealing with many colums, you can reach a nicer output with ´ colSums(is.na(df)) %>% as.data.frame() ´ or ´ as.data.frame(colSums(is.na(df))) ´ – BMLopes Sep 30 '20 at 19:39
37

A quick and easy Tidyverse solution to get a NA count for all columns is to use summarise_all() which I think makes a much easier to read solution than using purrr or sapply

library(tidyverse)
# Example data
df <- tibble(col1 = c(1, 2, 3, NA), 
             col2 = c(NA, NA, "a", "b"))

df %>% summarise_all(~ sum(is.na(.)))
#> # A tibble: 1 x 2
#>    col1  col2
#>   <int> <int>
#> 1     1     2

Or using the more modern across() function:

df %>% summarise(across(everything(), ~ sum(is.na(.))))
Moohan
  • 933
  • 1
  • 9
  • 27
  • how would I get the NAs as a single number in total? – Ben Jun 08 '22 at 09:26
  • 3
    @Ben - if you want to save the output as a separate column you need to supply the `.names` argument. This uses a glue specification, e.g.: `df %>% summarise(across(everything(), ~ sum(is.na(.)), .names = "{.col}_na.count")` will create a new column for each variable, with the number of NAs for that variable. – cjdbarlow Oct 19 '22 at 02:32
24

If you are looking to count the number of NAs in the entire dataframe you could also use

sum(is.na(df))
bkielstr
  • 419
  • 2
  • 7
16

In the summary() output, the function also counts the NAs so one can use this function if one wants the sum of NAs in several variables.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Shahin
  • 428
  • 4
  • 9
  • 2
    Worth noting that the `summary` output when used on a single column is useable, while its output from an entire data frame is character and the counts are difficult to extract if you need them later. See `c(summary(mtcars))`. – Rich Scriven Aug 20 '16 at 19:11
11

A tidyverse way to count the number of nulls in every column of a dataframe:

library(tidyverse)
library(purrr)

df %>%
    map_df(function(x) sum(is.na(x))) %>%
    gather(feature, num_nulls) %>%
    print(n = 100)
Abi K
  • 631
  • 2
  • 7
  • 13
8

This form, slightly changed from Kevin Ogoros's one:

na_count <-function (x) sapply(x, function(y) sum(is.na(y)))

returns NA counts as named int array

hute37
  • 197
  • 2
  • 8
6
sapply(name of the data, function(x) sum(is.na(x)))
UseR10085
  • 7,120
  • 3
  • 24
  • 54
UTKARSH
  • 111
  • 2
  • 2
  • 2
    See "[Explaining entirely code-based answers](https://meta.stackoverflow.com/q/392712/128421)". While this might be technically correct it doesn't explain why it solves the problem or should be the selected answer. We should educate in addition to help solve the problem. – the Tin Man May 24 '20 at 04:31
3

User rrs answer is right but that only tells you the number of NA values in the particular column of the data frame that you are passing to get the number of NA values for the whole data frame try this:

apply(<name of dataFrame>, 2<for getting column stats>, function(x) {sum(is.na(x))})

This does the trick

MF.OX
  • 2,366
  • 1
  • 24
  • 28
iec2011007
  • 1,828
  • 3
  • 24
  • 38
3

Try this:

length(df$col[is.na(df$col)])
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
3

I read a csv file from local directory. Following code works for me.

# to get number of which contains na
sum(is.na(df[, c(columnName)]) # to get number of na row

# to get number of which not contains na
sum(!is.na(df[, c(columnName)]) 

#here columnName is your desire column name
reza.cse08
  • 5,938
  • 48
  • 39
3

If you're looking for null values in each column to be printed one after the other then you can use this. Simple solution.

lapply(df, function(x) { length(which(is.na(x)))})
Prakhar Rathi
  • 905
  • 1
  • 11
  • 25
2

Similar to hute37's answer but using the purrr package. I think this tidyverse approach is simpler than the answer proposed by AbiK.

library(purrr)
map_dbl(df, ~sum(is.na(.)))

Note: the tilde (~) creates an anonymous function. And the '.' refers to the input for the anonymous function, in this case the data.frame df.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Chris Kiniry
  • 499
  • 3
  • 13
2

Another option using complete.cases like this:

df <- data.frame(col = c(1,2,NA))
df
#>   col
#> 1   1
#> 2   2
#> 3  NA
sum(!complete.cases(df$col))
#> [1] 1

Created on 2022-08-27 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
0

You can use this to count number of NA or blanks in every column

colSums(is.na(data_set_name)|data_set_name == '')
0

In the interests of completeness you can also use the useNA argument in table. For example table(df$col, useNA="always") will count all of non NA cases and the NA ones.

dpel
  • 1,954
  • 1
  • 21
  • 31