Using R to shift values to the left of data.frame

Question

Okay, so I have this data.frame:

        A      B      C
1  yellow purple   <NA>
2    <NA>   <NA> yellow
3  orange yellow   <NA>
4  orange   <NA>  brown
5    <NA>  brown purple
6  yellow purple   pink
7  purple  green   pink
8  yellow   pink  green
9  purple orange   <NA>
10 purple   <NA>  brown

And I am interested in taking all the missing values from the first columns and replace them with the values over from the other columns, as an example with rows 2, 4, 5 and 10.

        A      B      C
1  yellow purple   <NA>
2  yellow   <NA>   <NA>
3  orange yellow   <NA>
4  orange  brown   <NA>
5   brown purple   <NA>
6  yellow purple   pink
7  purple  green   pink
8  yellow   pink  green
9  purple orange   <NA>
10 purple  brown   <NA>

My idea was to loop over the columns to grab the rows with the missing values and replace them with the values in the column to the right but that is also potentially flawed because what if there were 4 columns and two values in columns 2 and 3 were NA. Does anyone have an idea of an algorithm that may work?

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

4

We can loop over the rows and concatenate the non-NA elements followed by the NA elements and assign it back to the dataset

df[] <-  t(apply(df, 1, function(x) c(x[!is.na(x)], x[is.na(x)])))
df
#        A      B     C
#1  yellow purple  <NA>
#2  yellow   <NA>  <NA>
#3  orange yellow  <NA>
#4  orange  brown  <NA>
#5   brown purple  <NA>
#6  yellow purple  pink
#7  purple  green  pink
#8  yellow   pink green
#9  purple orange  <NA>
#10 purple  brown  <NA>

data

df <- structure(list(A = c("yellow", NA, "orange", "orange", NA, "yellow", 
"purple", "yellow", "purple", "purple"), B = c("purple", NA, 
"yellow", NA, "brown", "purple", "green", "pink", "orange", NA
 ), C = c(NA, "yellow", NA, "brown", "purple", "pink", "pink", 
 "green", NA, "brown")), .Names = c("A", "B", "C"), row.names = c("1", 
 "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 03 '18 at 01:43

akrun

874,273
37
540
662

Gets the job done. Thanks! Can you tell me a little more about the algorithm and what exactly the function in the apply function is taking in? Those are each of the rows for the data frame? Also, Ive never seen the `[]` used next to the dataframe object name like you did there. What does that do, too? – JellisHeRo Mar 03 '18 at 02:10
@JellisHeRo It is taking in each row of the dataset when we specify `MARGIN = 1` in `apply`. Then, we subset the elements which are non-NA `x[!is.na(x)]`, followed by elements that are NA `x[is.na(x)]` and concatenate it with `c`. The assignment with `[]` ensures that the original data structure is restored – akrun Mar 03 '18 at 02:13
Alright so if I understand this correctly, it essentially repurposes the values that we're already in each row. I don't take that the `[]` trick was a new R feature. Maybe it is but it wouldn't surprise me. – JellisHeRo Mar 03 '18 at 02:17
1

I want to point out that even though my question was apparently asked before, I think your answer is the best of all the answers considered. – JellisHeRo Mar 06 '18 at 15:34

Using R to shift values to the left of data.frame

1 Answers1

data

Linked

Related