Find the index position of the first non-NA value in an R vector?

Question

I have a problem where a vector has a bunch of NAs at the beginning, and data thereafter. However the peculiarity of my data is that the first n values that are non NA, are probably unreliable, so I would like to remove them and replace them with NA.

For example, if I have a vector of length 20, and non-NAs start at index position 4:

> z
 [1]          NA          NA          NA -1.64801942 -0.57209233  0.65137286  0.13324344 -2.28339326
 [9]  1.29968050  0.10420776  0.54140323  0.64418164 -1.00949072 -1.16504423  1.33588892  1.63253646
[17]  2.41181291  0.38499825 -0.04869589  0.04798073

I would like to remove the first 3 non-NA values, which I believe to be unreliable, to give this:

> z
 [1]          NA          NA          NA          NA          NA          NA  0.13324344 -2.28339326
 [9]  1.29968050  0.10420776  0.54140323  0.64418164 -1.00949072 -1.16504423  1.33588892  1.63253646
[17]  2.41181291  0.38499825 -0.04869589  0.04798073

Of course I need a general solution and I never know when the first non-NA value starts. How would I go about doing this? IE how do I find out the index position of the first non-NA value?

For completeness, my data is actually arranged in a data frame with lots of these vectors in columns, and each vector can have a different non-NA starting position. Also once the data starts, there may be sporadic NAs further down, which prevents me from simply counting their number, as a solution.

Is there an efficient way to do this that stops searching when it finds the first one? — Alex Brown, Jun 12 '13 at 16:59

score 80 · Accepted Answer · answered Jul 24 '11 at 18:25

80

Use a combination of is.na and which to find the non-NA index locations.

NonNAindex <- which(!is.na(z))
firstNonNA <- min(NonNAindex)

# set the next 3 observations to NA
is.na(z) <- seq(firstNonNA, length.out=3)

answered Jul 24 '11 at 18:25

Joshua Ulrich

173,410
32
338
418

Dang, this was my second guess. Wanted to be fancy with `rle()` but I like this solution better. – Roman Luštrik Jul 24 '11 at 18:28
Perfect thanks. After some thought I came up with min((1:length(z))[!is.na(z)]), but of course this which idea is much better. Perfect – Thomas Browne Jul 24 '11 at 19:43
6

Is `firstNonNA <- NonNAindex[1]` faster? Would I run into some problem with using `[1]` vs. `min()`? – Florian Jenn Mar 11 '13 at 15:06
1

@FlorianJenn: yes, that would likely be faster, especially for larger vectors. I can't immediately think of a problem of using it over `min`. – Joshua Ulrich Mar 12 '13 at 16:00
For those who just want to remove all NAs (a bit different from what this question is asking): `x <- c(NA, "B", "C", "D"); x[!is.na(x)]` – carbocation Sep 22 '18 at 09:03

score 27 · Answer 2 · answered Jul 24 '11 at 18:43

27

Similar idea to that of @Joshua, but using which.min()

## dummy data
set.seed(1)
dat <- runif(10)
dat[seq_len(sample(10, 1))] <- NA

## start of data
start <- which.min(is.na(dat))

which gives:

> (start <- which.min(is.na(dat)))
[1] 4

Use this to set start:(start+2) to NA

is.na(dat) <- seq(start, length.out = 3)

resulting in:

> dat
 [1]         NA         NA         NA         NA         NA
 [6]         NA 0.94467527 0.66079779 0.62911404 0.06178627

answered Jul 24 '11 at 18:43

Gavin Simpson

170,508
25
396
453

even cleaner. Thanks, and also for the continuation of the answer. – Thomas Browne Jul 24 '11 at 19:43
3

+1, but I'm not sure about cleaner. It's shorter but may be less clear to people who don't realize `which.min` coerces `TRUE` and `FALSE` to `1` and `0`, respectively. – Joshua Ulrich Jul 25 '11 at 02:45
2

@Joshua agreed, it also relies on the behaviour that which.min returns the first of any tied minima. Not sure shorter deserves the accept. – Gavin Simpson Jul 25 '11 at 06:43
This one seems to struggle with instances where NAs are followed by non-NAs and than you have NAs here and there. The index returned is not applicable. The solution detailed by Joshua works as expected. – Matteo Castagna Jan 22 '18 at 13:44
1

@MatteoCastagna This works for the OPs example and Q, where `NA`s are at the front of the vector. As I mention in the comments, this relies on behaviour of `which.min()`, which is exactly the reason for it failing in the situation you describe. – Gavin Simpson Jan 22 '18 at 15:53

dww · Answer 3 · 2019-07-28T11:39:30.653

17

If dealing with large data, Position is considerably faster than which, because it only evaluates until a match is found, rather than evaluating the whole vector.

x=c(rep(NA,3),1:1e8)
Position(function(x) !is.na(x), x)
# 4

We can assign NA to the following N values (or the end of the vector, whichever comes first) by

pos = Position(function(x)!is.na(x), x)
x[pos:min(pos+N-1, length(x))] <- NA

edited Jul 28 '19 at 11:39

answered Aug 06 '16 at 06:36

dww

30,425
5
68
111

This performs well on large data – stats-hb Oct 18 '17 at 11:49
1

no need to define a new function, you can use `complete.cases` – IceCreamToucan Feb 16 '18 at 01:35

InColorado · Answer 4 · 2017-05-19T22:31:24.443

2

na.trim() in the zoo package can help.

library(zoo)
dummy.data <- c(rep(NA, 5), seq(1:7), NA)
x <- length(dummy.data) - length(na.trim(dummy.data, sides = "left"))
dummy.data[(x+1):(x+3)] <- NA
dummy.data
[1] NA NA NA NA NA NA NA NA  4  5  6  7 NA

edited May 19 '17 at 22:31

answered May 19 '17 at 22:10

InColorado

308
2
12

score 2 · Answer 5 · answered Jul 24 '11 at 18:26

I would do it something along the lines of

# generate some data
tb <- runif(10)
tb[1:3] <- NA

# I convert vector to TRUE/FALSE based on whether it's NA or not
# rle function will tell you when something "changes" in the vector
# (in our case from TRUE to FALSE)
tb.rle <- rle(is.na(tb))

# this is where vector goes from all TRUE to (at least one) FALSE
# your first true number is one position ahead, so +1
tb.rle$lengths[1] 

# you can now subset your vector with the first non-NA value
# and do with it whatever you want. I assign it a fantastic 
# non-believable number
tb[tb.rle$lengths[1] + 1] <- 42

score -2 · Answer 6 · answered May 21 '18 at 11:35

-2

You can directly use replace() function also, I know answer is already there but like replace() is too good with these kind of things

For Example-:

A <- c(1,2,3,4,5,NA,58,NA,98,NA,NA,NA)
which(is.na(A))
A <- replace(A,1:3,NA)

answered May 21 '18 at 11:35

Bharat Kaushik

1

Find the index position of the first non-NA value in an R vector?

6 Answers6

Linked

Related