R: In dataframe: set first non-NA value in column to NA

Question

I have a large dataframe, 300+ columns (time series) with about 2600 observations. The columns are filled with a lot of NA's and then a short time series, and then typically NA's again. I would like to find the first non-NA value in each column and replace it with NA.

This is what I'm hoping to achieve, only with a much bigger dataframe:

Before:

   x1 x2 x3 x4
1  NA NA NA NA
2  NA NA NA NA
3   1  1 NA NA
4   2  2  1  1
5   3  3  2  2
6   4  4  3  3
7   5  5  4  4
8   6  6  5  5
9   7  7  6  6
10  8  8  7  7
11  9  9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA

After:

   x1 x2 x3 x4
1  NA NA NA NA
2  NA NA NA NA
3  NA NA NA NA
4   2  2 NA NA
5   3  3  2  2
6   4  4  3  3
7   5  5  4  4
8   6  6  5  5
9   7  7  6  6
10  8  8  7  7
11  9  9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA

I've searched around and found a way to do this for each column, but my efforts to apply it to the whole dataframe has proven difficult.

I have created an example dataframe to reproduce my original dataframe:

#Dataframe with NA
x1=x2=c(NA,NA,1:10,NA,NA)
x3=x4=c(NA,NA,NA,1:7,NA,NA,NA,NA)
df=data.frame(x1,x2,x3,x4)

I have used this to replace the first value with NA in 1 column (provided by @Joshua Ulrich here), however I would like to apply it to all columns without manually changing 300+ codes:

NonNAindex <- which(!is.na(df[,1]))
firstNonNA <- min(NonNAindex)
is.na(df[,1]) <- seq(firstNonNA, length.out=1)

I have tried to set the above as a function and run it for all columns with apply/lapply, as well as a for loop, but haven't really figured out how to apply the changes to my dataframe. I'm sure there is something I've completely overlooked as I'm just taking my first small steps in R.

All suggestions would be highly appreciated!

akrun · Accepted Answer · 2017-10-24T00:59:23.250

5

We can use base R

df1[] <- lapply(df1, function(x) replace(x, which(!is.na(x))[1], NA))
df1
#   x1 x2 x3 x4
#1  NA NA NA NA
#2  NA NA NA NA
#3  NA NA NA NA
#4   2  2 NA NA
#5   3  3  2  2
#6   4  4  3  3
#7   5  5  4  4
#8   6  6  5  5
#9   7  7  6  6
#10  8  8  7  7
#11  9  9 NA NA
#12 10 10 NA NA
#13 NA NA NA NA
#14 NA NA NA NA

Or as @thelatemail suggested

df1[] <- lapply(df1, function(x) replace(x, Position(Negate(is.na), x), NA))

edited Oct 24 '17 at 00:59

answered Oct 24 '17 at 00:21

akrun

874,273
37
540
662

1

Despite my best efforts I can't come up with anything markedly quicker. Maybe `df[] <- lapply(df, function(x) replace(x, Position(Negate(is.na), x), NA) )` might be nice, which means you can add a bit of overhead and do `df[] <- Map(replace, df, lapply(df, Position, f=Negate(is.na)), NA)` – thelatemail Oct 24 '17 at 00:54

markdly · Answer 2 · 2017-10-24T00:24:12.333

Since you would like to do this for all columns, you could use the mutate_all function from dplyr. See http://dplyr.tidyverse.org/ for more information. In particular, you may want to look at some of the examples shown here.

library(dplyr)
mutate_all(df, funs(if_else(row_number() == min(which(!is.na(.))), NA_integer_, .)))
#>    x1 x2 x3 x4
#> 1  NA NA NA NA
#> 2  NA NA NA NA
#> 3  NA NA NA NA
#> 4   2  2 NA NA
#> 5   3  3  2  2
#> 6   4  4  3  3
#> 7   5  5  4  4
#> 8   6  6  5  5
#> 9   7  7  6  6
#> 10  8  8  7  7
#> 11  9  9 NA NA
#> 12 10 10 NA NA
#> 13 NA NA NA NA
#> 14 NA NA NA NA

R: In dataframe: set first non-NA value in column to NA

2 Answers2