Determine the number of NA values in a column

Question

I want to count the number of NA values in a data frame column. Say my data frame is called df, and the name of the column I am considering is col. The way I have come up with is following:

sapply(df$col, function(x) sum(length(which(is.na(x)))))

Is this a good/most efficient way to do this?

score 430 · Answer 1 · answered Jun 04 '14 at 01:31

430

You're over-thinking the problem:

sum(is.na(df$col))

answered Jun 04 '14 at 01:31

rrs

9,615
4
28
38

Thank you for this. To expand this a little bit. In counting amount of arbitrary `value`, other than `NA` is writing a boolean function `is.value` and then using `sum(is.value(df$col))` the way to go or is there a more concise direct syntax for this? – user3274289 Jun 04 '14 at 02:11
3

Was too quick to ask. `sum(df$col==value,na.rm=FALSE)` does the trick. – user3274289 Jun 04 '14 at 02:18
4

@user3274289: although you'll usually want `na.rm=TRUE`, because otherwise if `df$col` contains NAs, `sum` will return `NA`. – jbaums Jun 04 '14 at 02:30
1

Sometimes I think I am over-thinking, till I got this answer...well, it's true... – Rugal Feb 29 '16 at 04:40
sorry but this does not work for me. I get this warning Warning message: In is.na(nom$wd) : is.na() applied to non-(list or vector) of type 'NULL', and the count is just zero. – Herman Toothrot Mar 16 '16 at 17:35
you can also count non-NA values with `sum(!is.na(df$col))` – mschmidt Nov 24 '22 at 07:31

Kevin Ogoro · Answer 2 · 2015-06-30T18:40:48.863

101

If you are looking for NA counts for each column in a dataframe then:

na_count <-sapply(x, function(y) sum(length(which(is.na(y)))))

should give you a list with the counts for each column.

na_count <- data.frame(na_count)

Should output the data nicely in a dataframe like:

----------------------
| row.names | na_count
------------------------
| column_1  | count

edited Jun 30 '15 at 18:40

answered Jun 30 '15 at 18:32

Kevin Ogoro

1,107
1
9
8

1

To include the row names as a column, also run `na_count$name<-rownames(na_count)`. – Matt Nov 13 '15 at 17:02
9

`na_count <-sapply(x, function(y) sum(is.na(y)))` is a shorter alternative. – Vincent Bonhomme Mar 26 '16 at 10:19
1

Didn't work for me :( Had to change it to: na_count <- apply(x, function(y) sum(is.na(y)) , MARGIN = 2) – Angel Garcia Campos Jul 27 '17 at 09:36
I don't think we need to use both the sum and the length function (in the first na_count assignment)? Just length should be sufficient. – Yandle Feb 02 '19 at 17:33

score 73 · Answer 3 · answered Sep 07 '17 at 23:39

73

Try the colSums function

df <- data.frame(x = c(1,2,NA), y = rep(NA, 3))

colSums(is.na(df))

#x y 
#1 3

answered Sep 07 '17 at 23:39

Tony Ladson

3,539
1
23
30

4

If you are dealing with many colums, you can reach a nicer output with ´ colSums(is.na(df)) %>% as.data.frame() ´ or ´ as.data.frame(colSums(is.na(df))) ´ – BMLopes Sep 30 '20 at 19:39

Moohan · Answer 4 · 2021-04-07T11:21:24.497

37

A quick and easy Tidyverse solution to get a NA count for all columns is to use summarise_all() which I think makes a much easier to read solution than using purrr or sapply

library(tidyverse)
# Example data
df <- tibble(col1 = c(1, 2, 3, NA), 
             col2 = c(NA, NA, "a", "b"))

df %>% summarise_all(~ sum(is.na(.)))
#> # A tibble: 1 x 2
#>    col1  col2
#>   <int> <int>
#> 1     1     2

Or using the more modern across() function:

df %>% summarise(across(everything(), ~ sum(is.na(.))))

edited Apr 07 '21 at 11:21

answered Oct 31 '19 at 10:58

Moohan

933
1
9
27

how would I get the NAs as a single number in total? – Ben Jun 08 '22 at 09:26
3

@Ben - if you want to save the output as a separate column you need to supply the `.names` argument. This uses a glue specification, e.g.: `df %>% summarise(across(everything(), ~ sum(is.na(.)), .names = "{.col}_na.count")` will create a new column for each variable, with the number of NAs for that variable. – cjdbarlow Oct 19 '22 at 02:32

score 24 · Answer 5 · answered Jun 04 '14 at 05:40

24

If you are looking to count the number of NAs in the entire dataframe you could also use

sum(is.na(df))

answered Jun 04 '14 at 05:40

bkielstr

419
2
7

score 16 · Answer 6 · edited Aug 20 '16 at 19:09

16

In the summary() output, the function also counts the NAs so one can use this function if one wants the sum of NAs in several variables.

edited Aug 20 '16 at 19:09

Rich Scriven

97,041
11
181
245

answered Jun 19 '15 at 16:13

Shahin

428
4
9

2

Worth noting that the `summary` output when used on a single column is useable, while its output from an entire data frame is character and the counts are difficult to extract if you need them later. See `c(summary(mtcars))`. – Rich Scriven Aug 20 '16 at 19:11

score 11 · Answer 7 · answered May 17 '17 at 01:56

11

A tidyverse way to count the number of nulls in every column of a dataframe:

library(tidyverse)
library(purrr)

df %>%
    map_df(function(x) sum(is.na(x))) %>%
    gather(feature, num_nulls) %>%
    print(n = 100)

answered May 17 '17 at 01:56

Abi K

631
2
7
13

3

You don't even need purrr: `df %>% summarise_all(funs(sum(is.na(.))))` – David Bruce Borenstein Oct 21 '18 at 16:45
2

If you are lazy like me, you can write the same in @Abi K's answer in the somewhat shorter purrr syntax as: `df %>% map_df(~sum(is.na(.)))` or without dplyr as `map_df(~sum(is.na(df)))` – Agile Bean Apr 22 '19 at 06:44
Best solution for me as it gives the best colnames to proceed – Sebastian Hesse Jan 02 '23 at 15:37

score 8 · Answer 8 · answered Jan 09 '16 at 18:31

8

This form, slightly changed from Kevin Ogoros's one:

na_count <-function (x) sapply(x, function(y) sum(is.na(y)))

returns NA counts as named int array

answered Jan 09 '16 at 18:31

hute37

197
2
8

1

to get result as list: `na_count <-function (x) lapply(x, function(y) sum(is.na(y)))` – hute37 Jan 09 '16 at 18:33

score 6 · Answer 9 · edited May 23 '20 at 10:01

6

sapply(name of the data, function(x) sum(is.na(x)))

edited May 23 '20 at 10:01

UseR10085

7,120
3
24
54

answered Mar 29 '20 at 12:07

UTKARSH

111
2
2

2

See "[Explaining entirely code-based answers](https://meta.stackoverflow.com/q/392712/128421)". While this might be technically correct it doesn't explain why it solves the problem or should be the selected answer. We should educate in addition to help solve the problem. – the Tin Man May 24 '20 at 04:31

score 3 · Answer 10 · edited Dec 03 '16 at 23:43

3

User rrs answer is right but that only tells you the number of NA values in the particular column of the data frame that you are passing to get the number of NA values for the whole data frame try this:

apply(<name of dataFrame>, 2<for getting column stats>, function(x) {sum(is.na(x))})

This does the trick

edited Dec 03 '16 at 23:43

MF.OX

2,366
1
24
28

answered Oct 08 '15 at 11:45

iec2011007

1,828
3
24
38

There are some typos that make this code non-functional. Try this; `apply(df, 2, function(x) sum(is.na(x)))` – user3495945 Mar 07 '16 at 09:49

score 3 · Answer 11 · edited Aug 19 '16 at 20:50

3

Try this:

length(df$col[is.na(df$col)])

edited Aug 19 '16 at 20:50

Rich Scriven

97,041
11
181
245

answered Mar 25 '16 at 20:09

Rabish kumar Singh

55
3

score 3 · Answer 12 · answered May 25 '17 at 12:33

I read a csv file from local directory. Following code works for me.

# to get number of which contains na
sum(is.na(df[, c(columnName)]) # to get number of na row

# to get number of which not contains na
sum(!is.na(df[, c(columnName)]) 

#here columnName is your desire column name

score 3 · Answer 13 · answered Mar 20 '21 at 09:05

3

If you're looking for null values in each column to be printed one after the other then you can use this. Simple solution.

lapply(df, function(x) { length(which(is.na(x)))})

answered Mar 20 '21 at 09:05

Prakhar Rathi

905
1
11
25

score 2 · Answer 14 · edited Oct 31 '19 at 13:07

2

Similar to hute37's answer but using the purrr package. I think this tidyverse approach is simpler than the answer proposed by AbiK.

library(purrr)
map_dbl(df, ~sum(is.na(.)))

Note: the tilde (~) creates an anonymous function. And the '.' refers to the input for the anonymous function, in this case the data.frame df.

edited Oct 31 '19 at 13:07

Ben Bolker

211,554
25
370
453

answered Jul 29 '18 at 17:31

Chris Kiniry

499
3
13

Nice and clean but the colname is quite messy. Do you have a nice solution to call is "count_NA"? – Sebastian Hesse Jan 02 '23 at 15:34

score 2 · Answer 15 · answered Aug 27 '22 at 11:11

2

Another option using complete.cases like this:

df <- data.frame(col = c(1,2,NA))
df
#>   col
#> 1   1
#> 2   2
#> 3  NA
sum(!complete.cases(df$col))
#> [1] 1

^{Created on 2022-08-27 with reprex v2.0.2}

answered Aug 27 '22 at 11:11

Quinten

35,235
5
20
53

score 0 · Answer 16 · answered Dec 07 '17 at 19:39

0

You can use this to count number of NA or blanks in every column

colSums(is.na(data_set_name)|data_set_name == '')

answered Dec 07 '17 at 19:39

Prakhar Srivastava

69
1
3

score 0 · Answer 17 · answered Jan 27 '21 at 10:09

0

In the interests of completeness you can also use the useNA argument in table. For example table(df$col, useNA="always") will count all of non NA cases and the NA ones.

answered Jan 27 '21 at 10:09

dpel

1,954
1
21
31

Determine the number of NA values in a column

17 Answers17

Linked

Related