3

I have a data frame that looks similar to this (I've cut some out for easier reference, data has 93 rows):

                 Rank                      1     A      B      C    D
                 34                     (TPE)    2      4      6    12
                 35                     (TUR)    2      2      9    13
                 36                     (GRE)    2      1      1     4
              (UGA)                         2    1      1      4   <NA>

I need to have the columns line up, but some of the data in "Rank" is offset to the left one column. I have assigned the rows with this problem to a vector: off.set.rows <- c(which(is.na(df[ , 6]))) I need to have all rows in that vector shift one column to the right and replace the empty space it leaves in column 1 with the number in column 1 in the row previous to it. It should look like this:

                 Rank                      1     A      B      C    D
                 34                     (TPE)    2      4      6    12
                 35                     (TUR)    2      2      9    13
                 36                     (GRE)    2      1      1     4
                 36                     (UGA)    2      1      1     4   

I've tried this: df[off.set.rows, 1:(ncol(df))] <- df[off.set.rows, 2:(ncol(df))], but it shifts everything in the row left one column and the (UGA) disappears, it moves the to column 5 and then repeats the value that moves into column 2 again in column 6 like this:

                 Rank                      1     A      B      C    D
                 34                     (TPE)    2      4      6    12
                 35                     (TUR)    2      2      9    13
                 36                     (GRE)    2      1      1     4
                 2                         1     1      4     <NA>   2

Help is much appreciated!!

data_life
  • 387
  • 1
  • 11
  • Related: [Shifting non-NA cells to the left](https://stackoverflow.com/questions/23285215/shifting-non-na-cells-to-the-left) – Henrik Nov 07 '21 at 09:11

2 Answers2

2

Base R solution: How it works:

  1. Subset df to only those rows that met the criteria defined in your off.set.rows
  2. add a new column at the start to x
  3. paste colnames from df to x
  4. bind the rows of df and x together
  5. remove the rows that meet the criteria defined in your off.set.rows
  6. Use lag() to add the value above in Rank
off.set.rows <- c(which(is.na(df[ , 6])))

x <- subset(df, rownames(df) %in% off.set.rows)
x <- cbind(new=0, x)
colnames(x) <- colnames(df)
df <- rbind(df, x[1:6])
df <- subset(df, !rownames(df) %in% off.set.rows)
df$Rank <- ifelse(df$Rank==0, lag(df$Rank), df$Rank)
   Rank    X1 A B C  D
1    34 (TPE) 2 4 6 12
2    35 (TUR) 2 2 9 13
3    36 (GRE) 2 1 1  4
41   36 (UGA) 2 1 1  4

data:

df <- structure(list(Rank = c("34", "35", "36", "(UGA)"), X1 = c("(TPE)", 
"(TUR)", "(GRE)", "2"), A = c(2L, 2L, 2L, 1L), B = c(4L, 2L, 
1L, 1L), C = c(6L, 9L, 1L, 4L), D = c(12L, 13L, 4L, NA)), class = "data.frame", row.names = c(NA, 
-4L))
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • that actually just removed the rows that didn't line up and then added them at the bottom as new rows that were still in the same order. So the row that was (UGA), 2,1,1,4 actually got put at the bottom of the table instead of where it was on row 37, and it removed the "(UGA)" entirely. Any insight? – data_life Nov 07 '21 at 07:49
  • Is the output that I present your expected output? If yes then please add your data as follow: 1. write `dput(yourdata)` 2. copy the output 3. edit your question and finally paste it there. – TarJae Nov 07 '21 at 07:55
1

I think the attempt in the original post was close. The column numbers to replace should be from 2 to ncol(df) (2-6). And those are replaced by columns 1 through ncol(df) - 1 (1-5).

After moving values for those particular rows, I might consider replacing the first column in those rows with NA, then use fill from tidyr to replace them with the last non-missing value. This will also take care of situations when you may have consecutive offset rows (if that is a possibility).

library(tidyr)

off.set.rows <- c(which(is.na(df[ , ncol(df)])))
df[off.set.rows, 2:ncol(df)] <- df[off.set.rows, 1:ncol(df)-1]
df[off.set.rows, 1] <- NA
fill(df, 1, .direction = "down")

Output

  Rank    X1 A B C  D
1   34 (TPE) 2 4 6 12
2   35 (TUR) 2 2 9 13
3   36 (GRE) 2 1 1  4
4   36 (UGA) 2 1 1  4
Ben
  • 28,684
  • 5
  • 23
  • 45
  • Thank you!! Just so that I make sure I'm not having this same problem at another time, would you mind explaining why the code line with the "ncol" work when done that way? I'm not entirely sure what the "-1" is doing. And specifically what is the "df[off.set.rows, 1] <- NA is doing? – data_life Nov 08 '21 at 00:45
  • Sure...the `ncol(df)-1` is taking the number of columns, `ncol(df)` which is 6, and subtracting 1 to get 5. In other words, columns 1-5 for that row are being "shifted" to columns 2-6. Since the `NA` is at the end (column 6), your data for that row is contained within columns 1 through 5. – Ben Nov 08 '21 at 00:52
  • As for the `df[off.set.rows, 1] <- NA` is just replacing the first column in rows that were shifted with `NA` (missing). That just allows you to use `fill` explicitly on that whole column, where the missing values are replaced by the most recent non-missing value. In the example above, you have "(UGA)" in column 1 where it doesn't belong. This is changed to `NA`, and then `fill` will replace with the most recent non-missing (or non-`NA`) value, which is 36. – Ben Nov 08 '21 at 00:54