2

I have an array where I have to omit NA values. I know that it is an array full of matrices where every row has exactly one NA value. My approach works well for >2 columns of the matrices, but apply() drops one dimension when there are only two columns (as after omitting the NA values, one column disappears). As this step is part of a much larger code, I would like to avoid recoding the rest and make this step robust to the case when the number of columns is two. Here is a simple example:

#create an array
arr1 <- array(rnorm(3000),c(500,2,3))

#randomly distribute 1 NA value per row of the array
for(i in 1:500){
arr1[i,,sample(3,1)] <- NA
}

#omit the NAs from the array
arr1.apply <- apply(arr1, c(1,2),na.omit)

#we lose no dimension as every dimension >1
dim(arr1.apply)
[1]   2 500   2


#now repeat with a 500x2x2 array

#create an array
arr2 <- array(rnorm(2000),c(500,2,2))

#randomly distribute 1 NA value per row of the array
for(i in 1:500){
  arr2[i,,sample(2,1)] <- NA
}

#omit the NAs from the array
arr2.apply <- apply(arr2, c(1,2),na.omit)

#we lose one dimension because the last dimension collapses to size 1
dim(arr2.apply)
[1] 500   2

I do not want apply() to drop the last dimension as it breaks the rest of my code.

I am aware that this is a known issue with apply(), however, I am eager to resolve the problem in this very step, so any help would be appreciated. So far I've tried to wrap apply() in an array() command using the dimensions that should result, however, I think this mixes up the values in the matrix in a way that is not desirable.

Thanks for your help.

yrx1702
  • 1,619
  • 15
  • 27
  • Is there a specific reason to use array as opposed to list, matrix, dataframe? – Tony Hellmuth May 08 '18 at 10:09
  • The input files I get are in the form of an array, however, it would be possible to transform them to a list and then back to an array. The result has to be in array form. – yrx1702 May 08 '18 at 10:12
  • All good, just use this code: `apply(arr1,3,function(x) na.omit(x))` to get rid of the NA for each matrix in the array. That way to each of the matrix in the array separately it will remove the NA rows, hopefully as wanted! – Tony Hellmuth May 08 '18 at 10:13
  • That does not give the desired result, unfortunately. It returns a list of (in this case) three elements with different length instead of an array with equally sized matrices where one dimension has lost one column. – yrx1702 May 08 '18 at 10:20
  • 4
    Damn your arrays. – Tony Hellmuth May 08 '18 at 10:24
  • 1
    Know that feeling. – yrx1702 May 08 '18 at 10:37
  • Hopefully there is a simple `dplyr` trick - Good luck! – Tony Hellmuth May 08 '18 at 10:51

1 Answers1

1

I propose a stupid solution, but I think you have no choice if you want to keep it this way:

arr1.apply <- if(dim(arr1)[3] > 2){
apply(arr1, c(1,2),na.omit)} else{
array(apply(arr1, c(1,2),na.omit),dim = c(1,dim(arr1)[1:2]))}
denis
  • 5,580
  • 1
  • 13
  • 40