Put all the missing values to the right side of data frame

Question

I have many missing data in the data frame (data table), and I need to move them all to the right side. Are there any methods to do this?

Sample data

as.data.table(structure(list(`1` = c(NA_integer_, NA_integer_), `2` = c(NA, 
1L), `3` = c(1L, 1L), `4` = c(0L, NA), `5` = c(NA_integer_, NA_integer_
)), row.names = c(NA, -2L), class = c("data.table", "data.frame"
)))

The expected output is as follows,

1, 0, NA, NA, NA
1, 1, NA, NA, NA

Anoushiravan R · Answer 1 · 2021-05-29T16:45:23.997

Updated I just made some modifications so that we have an ordered column names but it can be changed.

You can use the following solution. Maybe there is an easier way of going about it, but for now I could think of this:

library(dplyr)
library(purrr)

df %>%
  pmap(., ~ c(c(...)[!is.na(c(...))], c(...)[is.na(c(...))])) %>%
  exec(rbind, !!!.) %>%
  as_tibble() %>%
  set_names(1:length(.))

# A tibble: 2 x 5
    `1`   `2`   `3`   `4`   `5`
  <int> <int> <int> <int> <int>
1     1     0    NA    NA    NA
2     1     1    NA    NA    NA

score 3 · Accepted Answer · answered May 29 '21 at 16:10

3

Since it's a data.table object,

library(data.table)
dcast(melt(copy(DT)[, rn := seq_len(.N)], "rn"
          )[, variable := as.character(rank(is.na(value), ties="first")), by = rn],
      rn ~ variable, value.var = "value")[, rn := NULL][]
#        1     2     3     4     5
#    <int> <int> <int> <int> <int>
# 1:     1     0    NA    NA    NA
# 2:     1     1    NA    NA    NA

Or perhaps a less-efficient method:

DT[] <- as.data.table(t(apply(DT, 1, function(z) z[order(is.na(z))])))
DT
#        1     2     3     4     5
#    <int> <int> <int> <int> <int>
# 1:     1     0    NA    NA    NA
# 2:     1     1    NA    NA    NA

FYI, in a sense you're defeating the premise of a data.frame-structure, where columns are fields/properties; shifting values between columns like this suggests that either (a) the data really belongs in a long-format (using melt) and kept that way, or (b) you should be dealing with a matrix instead. Perhaps there's a lot more to it than we see, just a thought.

answered May 29 '21 at 16:10

r2evans

141,215
6
77
149

Dear @r2evans, do you have any recommendations on where to start learning `data.table`? Since I am not familiar with it at all. I'd appreciate any advice. – Anoushiravan R May 29 '21 at 16:14
3

Honestly, a good start is its webpage https://rdatatable.gitlab.io/data.table/ and all of its [vignettes](https://cran.r-project.org/web/packages/data.table/index.html). From there, I forced myself to start using it on some small personal projects, and then started looking at questions here on SO and attempting to answer them. Since the authors/maintainers and many really-smart-`data.table` people usually beat me to the punch with answers, I learned from their commentary. It's an uphill battle (a different R dialect, really), but things just start "making sense" at some point. – r2evans May 29 '21 at 16:19
2

One thing I did was maintain a translation list (of sorts) on how to do things in base R, `dplyr` (and `tidyr`), `data.table`, and python pandas; while it's not something that I keep up-to-date or am ready to make public, just *making* that reference guide for myself really helped understand what I wanted from `data.table`. There are still many aspects I have not worked on, but I'm now fairly comfortable with reshaping, basic grouping, non-equi joins, and some of its internal function efficiencies. – r2evans May 29 '21 at 16:21
1

Thank you very much for your encouraging comments and also the link. Yes you are right I also found SO a great source of learning as in the past three months I learned a great deal in the same way as you mentioned. SO has some terrific data.table users just like yourself. Thank you very much indeed. – Anoushiravan R May 29 '21 at 16:24
I totally understand. This time you had a clearer idea on what to look for and what to learn. I think that's also the case when you start learning a new programming language (although I still have not). I saw some benchmarking attempts and was surprised to know that it is also faster that `pandas`. – Anoushiravan R May 29 '21 at 16:27

score 2 · Answer 3 · answered May 29 '21 at 16:32

Repeating a pmap answer, but this won't change your sequence of names in the output

pmap_dfr(dt, ~c(c(...)[!is.na(c(...))], rep(NA, sum(is.na(c(...))))) %>%
           setNames(names(c(...))))

# A tibble: 2 x 5
    `1`   `2`   `3`   `4`   `5`
  <int> <int> <int> <int> <int>
1     1     0    NA    NA    NA
2     1     1    NA    NA    NA

Put all the missing values to the right side of data frame

3 Answers3