-1

Suppose I have the below data frame

a <- data.frame(A = 1:10, B = 26:35, C = 101:110, D = 1001:1010)
a[c(2,4,7),"A"] <- NA
a

    A  B   C    D
1   1 26 101 1001
2  NA 27 102 1002
3   3 28 103 1003
4  NA 29 104 1004
5   5 30 105 1005
6   6 31 106 1006
7  NA 32 107 1007
8   8 33 108 1008
9   9 34 109 1009
10 10 35 110 1010

I want to know if we can use any of the loop functions to fill the missing values in column A with the mean of corresponding values from columns B,C and D, i.e. NA at row 2 should be replaced by 377 (mean of 27,102 and 1002).

I can get this to work using a for loop, but am just curious if the same can be done with apply functions or not.

Edit: what if I don't want to take mean of all the columns, but only of few. Lets say suppose I need mean of only B and D. I guess rowMeans wouldn't work then.

PratikGandhi
  • 79
  • 11

1 Answers1

2

First, a data.frame is not the right way to store entirely numeric data:

m = as.matrix(a)

From here, we can find the positions of NA values in the matrix

idx = which(is.na(m), arr.ind=TRUE)


     row col
[1,]   2   1
[2,]   4   1
[3,]   7   1

and fill them in

m[idx] <- rowMeans(m[idx[,1], ], na.rm=TRUE)


        A  B   C    D
 [1,]   1 26 101 1001
 [2,] 377 27 102 1002
 [3,]   3 28 103 1003
 [4,] 379 29 104 1004
 [5,]   5 30 105 1005
 [6,]   6 31 106 1006
 [7,] 382 32 107 1007
 [8,]   8 33 108 1008
 [9,]   9 34 109 1009
[10,]  10 35 110 1010

This will work for NAs in all columns, not just A.

If you have more NAs than rows, it should be faster to use rowMeans(m, na.rm=TRUE)[ idx[,1] ].


With zoo As @akrun mentioned, this also works:

library(zoo)
t(na.aggregate(t(m)))
Frank
  • 66,179
  • 8
  • 96
  • 180