2

I am working with a matrix containing a large number of NA. I would like to record the length of each sequence of NA in a new matrix.

The following example should be more plain.

#Generating a random 5x5 population matrix with 15 NA
M=matrix(sample(1:9,25,T),5)
M[sample(1:length(M),15,F)]=NA
dimnames(M)=list(paste(rep("City",dim(M)[1]),1:dim(M)[1],sep=""),paste(rep("Year",dim(M)[2]),1:dim(M)[2],sep=""))
M

      Year1 Year2 Year3 Year4 Year5
City1     2    NA    NA    NA    NA
City2    NA    NA    NA     6     8
City3     1    NA    NA     6    NA
City4    NA     5    NA    NA     1
City5     8    NA     1    NA     2

The desired output is the following. e.g. 4 4 4 4 denotes a sequence of 4 consecutive NA.

          Year1 Year2 Year3 Year4 Year5
City1     0     4     4     4     4
City2     3     3     3     0     0
City3     0     2     2     0     1
City4     1     0     2     2     0
City5     0     1     0     1     0

Do you have an idea of how I could go about that?

goclem
  • 904
  • 1
  • 10
  • 21
  • Don't forget that your matrix "R" will be filled with object of type "character". –  May 08 '15 at 05:35
  • In many ways it may actually be more appropriate to store a second matrix/set of indexes storing the locations/types of missing values. Maybe as an `attribute` to `M`. – thelatemail May 08 '15 at 05:49
  • OK. So you completely changed the original post. Thus, below answers are completely off topic now. –  May 08 '15 at 11:26
  • 1
    I'm a bit confused with the question and the edits. It might be better to let the original question unchanged and accept the most appropriate answer regardless of fitting or not into your final purpose and ask a new one. BTW, I'm guessing you'll find `na.approx` from package "zoo" as well as searches on SO like "interpolate/impute missing values" helpful. – alexis_laz May 08 '15 at 13:10
  • Sorry for the confusion. I'll edit my post again with my initial question and ask new questions in a different post. – goclem May 08 '15 at 13:14
  • The `na.approx` function from the package `zoo` seems to do the trick. – goclem May 11 '15 at 09:22

2 Answers2

2

Not the most efficient code ever:

r1=c(1,1,NA,1,1)
r2=c(1,NA,NA,1,1)
r3=c(1,NA,NA,NA,1)
r4=c(NA,NA,1,1,1)
r5=c(1,1,1,NA,NA)
M=rbind(r1,r2,r3,r4,r5)

like @Pascal pointed out, your approach will convert the entire matrix to characters, so you can assign the 1s to 0s instead and do this:

M[M == 1] <- 0

(xx <- t(apply(M, 1, function(x) {
  s <- sum(is.na(x))
  if (is.na(x[1])) x[is.na(x)] <- rep(4, s) else
    if (is.na(tail(x, 1))) x[is.na(x)] <- rep(5, s) else 
    x[is.na(x)] <- s
  x
})))

#    [,1] [,2] [,3] [,4] [,5]
# r1    0    0    1    0    0
# r2    0    2    2    0    0
# r3    0    3    3    3    0
# r4    4    4    0    0    0
# r5    0    0    0    5    5

This is your desired output. If you don't believe me, convert the 0s back to 1s and assign the letters based on the integers

xx[xx > 0] <- letters[xx[xx > 0]]
xx[xx == '0'] <- 1


r1=c(1,1,"a",1,1)
r2=c(1,"b","b",1,1)
r3=c(1,"c","c","c",1)
r4=c("d","d",1,1,1)
r5=c(1,1,1,"e","e")
R=rbind(r1,r2,r3,r4,r5)


identical(R, xx)
# [1] TRUE
rawr
  • 20,481
  • 4
  • 44
  • 78
1

This is another basis for a function that would be applied over each row. I tried, but couldn't avoid a for loop:

x = c(1,NA,1,NA,NA,1,NA,NA,NA,1,NA,NA,NA,NA)

#Find the Start and End of each sequence of NA's (Vectorized)
(start <- is.na(x) * c(T,!is.na(x[-length(x)])))
#>  [1] 0 1 0 1 0 0 1 0 0 0 1 0 0 0

(end <- is.na(x) * c(!is.na(x[-1]),T))
#>  [1] 0 1 0 0 1 0 0 0 1 0 0 0 0 1

# The difference betweeen the start and end of the sequence +1 is the sequence length
wStart <- which(!!start)
wEnd <- which(!!end)
sequenceLength <- wEnd[i] - wStart[i] + 1

# replace the sequence of NA's with it's class
for(i in seq_along(wStart))
    x[`:`(wStart[i],wEnd[i])] <- letters[sequenceLength] 

x
#> [1] "1" "a" "1" "b" "b" "1" "c" "c" "c" "1" "d" "d" "d" "d"

as in:

(xx <- t(apply(M, 1, function(x) {
    wStart <- which(!!(is.na(x) * c(T,!is.na(x[-length(x)]))))
    wEnd <- which(!!is.na(x) * c(!is.na(x[-1]),T))
    sequenceLength <- 
    for(i in seq_along(wStart))
        x[`:`(wStart[i],wEnd[i])] <- letters[wEnd[i] - wStart[i] + 1] 
    return(x)
})))
Jthorpe
  • 9,756
  • 2
  • 49
  • 64