R: NA returned despite !is.na

Question

I have a simple data frame:

> df <- data.frame(i=c(1:20), x=c(1:10, rep(NA, 10)))
> df
    i  x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10
11 11 NA
12 12 NA
13 13 NA
14 14 NA
15 15 NA
16 16 NA
17 17 NA
18 18 NA
19 19 NA
20 20 NA

I want to extract the rownames of the non NA parts which I can do as follows:

> rownames(df[c(1:20),][!is.na(df$x),])
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

So far so good. Now I want to skip the first row, but for some reason the command returns the same length output and now even contains an NA cell.

> rownames(df[c(2:20),][!is.na(df$x),])
 [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

It does not make sense to get a same sized vector and even one containing the supposedly excluded row. As you can see in the data frame above, df$x[11] is definitely NA, so why does it include something that !is.na() should usually get rid of? To be more specific: I am trying to observe an extract of a data frame, but exclude rows containing NAs. I would be happy about every piece of advice!

regarding your code, you simply forget to filter everytime `df` appears. To make it working, you can do `rownames(df[2:20,][!is.na(df[2:20,]$x),])` or `rownames(df[-1,][!is.na(df[-1,]$x),])` — Colonel Beauvel, Jul 13 '16 at 15:20
Thanks, this actually completely works with my pupose. I did indeed forget to specify/filter df in the second part! — oepix, Jul 13 '16 at 15:25
The reason it is included is that it takes the rownames for the datatable df[c(2:20),] which are from 2 to 20 (see rownames(df[c(2:20),]) and then returns the first 10 values out of those, which include the "11". As colonel mentioned rownames(df[c(2:20),][!is.na(df[c(2:20),]$x),]) works as well — , Jul 13 '16 at 15:26

score 3 · Answer 1 · answered Jul 13 '16 at 15:18

We can extract rownames directly from the logical output

tail(rownames(df)[!is.na(df$x)], -1)
#[1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Or instead of tail, we can use

rownames(df)[!is.na(df$x)][-1]
#[1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

score 2 · Accepted Answer · edited Jul 13 '16 at 15:33

2

The problem is !is.na(df$x) is indexed to df, not df[c(2:20). !is.na(df$x) is true for the first 10 elements. So, rownames(df[c(2:20),][!is.na(df$x),]) returns rownames for elements 2 through 11 of df.

df2 <- df[c(2:20),]
rownames(df2[!is.na(df2$x),])    

# [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

edited Jul 13 '16 at 15:33

989

12,579
5
31
53

answered Jul 13 '16 at 15:26

David

11,245
3
41
46

R: NA returned despite !is.na

2 Answers2