-1

I have a simple data frame:

> df <- data.frame(i=c(1:20), x=c(1:10, rep(NA, 10)))
> df
    i  x
1   1  1
2   2  2
3   3  3
4   4  4
5   5  5
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10
11 11 NA
12 12 NA
13 13 NA
14 14 NA
15 15 NA
16 16 NA
17 17 NA
18 18 NA
19 19 NA
20 20 NA

I want to extract the rownames of the non NA parts which I can do as follows:

> rownames(df[c(1:20),][!is.na(df$x),])
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

So far so good. Now I want to skip the first row, but for some reason the command returns the same length output and now even contains an NA cell.

> rownames(df[c(2:20),][!is.na(df$x),])
 [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

It does not make sense to get a same sized vector and even one containing the supposedly excluded row. As you can see in the data frame above, df$x[11] is definitely NA, so why does it include something that !is.na() should usually get rid of? To be more specific: I am trying to observe an extract of a data frame, but exclude rows containing NAs. I would be happy about every piece of advice!

M--
  • 25,431
  • 8
  • 61
  • 93
oepix
  • 151
  • 2
  • 12
  • 4
    regarding your code, you simply forget to filter everytime `df` appears. To make it working, you can do `rownames(df[2:20,][!is.na(df[2:20,]$x),])` or `rownames(df[-1,][!is.na(df[-1,]$x),])` – Colonel Beauvel Jul 13 '16 at 15:20
  • Thanks, this actually completely works with my pupose. I did indeed forget to specify/filter df in the second part! – oepix Jul 13 '16 at 15:25
  • 1
    The reason it is included is that it takes the rownames for the datatable df[c(2:20),] which are from 2 to 20 (see rownames(df[c(2:20),]) and then returns the first 10 values out of those, which include the "11". As colonel mentioned rownames(df[c(2:20),][!is.na(df[c(2:20),]$x),]) works as well –  Jul 13 '16 at 15:26

2 Answers2

3

We can extract rownames directly from the logical output

tail(rownames(df)[!is.na(df$x)], -1)
#[1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Or instead of tail, we can use

rownames(df)[!is.na(df$x)][-1]
#[1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
akrun
  • 874,273
  • 37
  • 540
  • 662
2

The problem is !is.na(df$x) is indexed to df, not df[c(2:20). !is.na(df$x) is true for the first 10 elements. So, rownames(df[c(2:20),][!is.na(df$x),]) returns rownames for elements 2 through 11 of df.

df2 <- df[c(2:20),]
rownames(df2[!is.na(df2$x),])    

# [1] "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
989
  • 12,579
  • 5
  • 31
  • 53
David
  • 11,245
  • 3
  • 41
  • 46