6

I'm working on a dataframe that basically looks like this one.


   X1   X2   X3 X4
x1  a    b   NA  c
x2  d   NA   NA  e
x3  f    g    h  i
x4  j   NA    k  l

What I want to do is move each cell that has a value row-wise to the left. At the end all cells that have a value should be gathered to the left while all cells with NAs should be gathered to the right.

Finally, the dataframe should look like this:


   X1   X2   X3 X4
x1  a    b   c  NA
x2  d    e   NA NA
x3  f    g    h  i
x4  j    k    l NA

Unfortunately, I have no idea how to do it.

Thank you very much for your help. (Maybe you could also explain what your code is doing?)

Rami

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Rami Al-Fahham
  • 617
  • 1
  • 6
  • 10
  • I wonder if this a duplicte of [this](http://stackoverflow.com/questions/25869011/move-nas-within-dataframe-in-r/). Although we didn't see @Anandas new function there – David Arenburg Oct 30 '14 at 11:32

5 Answers5

6

Could also try using length<-

df[] <- t(apply(df, 1, function(x) `length<-`(na.omit(x), length(x))))
df
#    X1 X2   X3   X4
# x1  a  b    c <NA>
# x2  d  e <NA> <NA>
# x3  f  g    h    i
# x4  j  k    l <NA>
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
5

You can grab my naLast function from my "SOfun" package.

The result would be a matrix, but you can easily wrap it in as.data.frame if you want:

as.data.frame(naLast(mydf, by = "row"))
#    X1 X2   X3   X4
# x1  a  b    c <NA>
# x2  d  e <NA> <NA>
# x3  f  g    h    i
# x4  j  k    l <NA>

Install the package with:

library(devtools)
install_github("mrdwab/SOfun")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
4
yourdata[]<-t(apply(yourdata,1,function(x){
                           c(x[!is.na(x)],x[is.na(x)])}))

should work : for each row, it replaces the row by a vector that consists of, first, the value that are not NA, then the NA values.

Cath
  • 23,906
  • 5
  • 52
  • 86
3

You can do this without looping in R. Let's assume you have a matrix m, which is probably more appropriate than a data.frame in this case. Then, we just use order to order within the rows, such that NA values go last. Since the sorting in R is conservative, the order of the non-NA values is preserved.

v <- m[order(row(m), is.na(m))]
dim(v) <- dim(m)
t(v)
##     [,1] [,2] [,3] [,4]
## [1,] "a"  "b"  "c"  NA  
## [2,] "d"  "e"  NA   NA  
## [3,] "f"  "g"  "h"  "i" 
## [4,] "j"  "k"  "l"  NA  

To achieve performance over millions of rows, you would probably want to use radix sort. Unfortunately, that is currently limited (why?) to 100,000 unique values, but it would look like:

v2 <- m[sort.list(is.na(m) + (row(m)-1L)*2L + 1L, method="radix")]
Michael Lawrence
  • 1,031
  • 5
  • 6
2

If you don't mind for loop:

ddf
   X1   X2   X3 X4
x1  a    b <NA>  c
x2  d <NA> <NA>  e
x3  f    g    h  i
x4  j <NA>    k  l

nddf = ddf
for(i in 1:nrow(ddf))
 nddf[i,] = sort(ddf[i,], na.last=T)

nddf
   X1 X2   X3   X4
x1  a  b    c <NA>
x2  d  e <NA> <NA>
x3  f  g    h    i
x4  j  k    l <NA>

If you do not want to sort:

rowfn = function(rr){
 rr2 = rr; j=1
 for(i in 1:length(rr))    if(!is.na(rr[i])){ rr2[j] = rr[i] ;  j = j+1 } 
 if(j<(length(rr)+1)) for(k in j:length(rr))   rr2[k] = NA
 rr2
 }

ddf
   X1   X2   X3 X4
x1  a    b <NA>  c
x2  d <NA> <NA>  e
x3  f    g    h  i
x4  j <NA>    k  l

nddf = ddf
for(i in 1:nrow(ddf)) nddf[i,] = rowfn(ddf[i,])

nddf
   X1 X2   X3   X4
x1  a  b    c <NA>
x2  d  e <NA> <NA>
x3  f  g    h    i
x4  j  k    l <NA>
rnso
  • 23,686
  • 25
  • 112
  • 234