Moving elements from column to column in r

Question

I have a dataframe that looks like this (but for every US county)

county	state	n_state_1	n_state_2	n_state_3	n_state_4
Autauga County	AL	NA	FL	NA	NA
Baldwin County	AL	GA	NA	TN	NA
Catron County	AL	FL	GA	NA	CA

I want to move the non-missing values (FL,GA,TN etc.) to the first columns starting from n_state_1 and then delete the columns containing only missing values to get:

county	state	n_state_1	n_state_2	n_state_3
Autauga County	AL	FL	NA	NA
Baldwin County	AL	GA	TN	NA
Catron County	AL	FL	GA	CA

I am struggling with the first step. I thought about using the function distinct but it doesn't work because there are non-empty elements in each column.

Related question: https://stackoverflow.com/questions/68245914/how-to-remove-rows-that-have-repeated-elements/68246062#68246062 — Martin Gal, Jul 04 '21 at 18:03
[Shifting non-NA cells to the left](https://stackoverflow.com/questions/23285215/shifting-non-na-cells-to-the-left) and Linked therein. — Henrik, Jul 04 '21 at 18:21

Martin Gal · Accepted Answer · 2021-07-04T18:53:00.087

You could use dplyr and tidyr:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(starts_with("n_state")) %>% 
  drop_na() %>%
  group_by(county, state) %>% 
  mutate(name=row_number()) %>% 
  pivot_wider(names_prefix="n_state_")

which returns

  county         state n_state_1 n_state_2 n_state_3
  <chr>          <chr> <chr>     <chr>     <chr>    
1 Autauga_County AL    FL        NA        NA       
2 Baldwin_County AL    GA        TN        NA       
3 Catron_County  AL    FL        GA        CA

What happened here?

pivot_longer takes the n_state_{n}-columns and collapses them into two columns: the name-column contains the original column name (n_state_1, n_state_2 etc), the value-column contains the states (FL, GA or <NA> in many cases).
Next we remove every <NA> entry. (Note: I use <NA> to make clear it's an NA-value).)
After a grouping by county and state we add a rownumber. These numbers will be later used to create the new column names.
pivot_wider now takes these row numbers and prefixes them with n_state_ to get the new columns. The values are taken from the value-column created in the second line of code. pivot_wider fills the missing values with <NA>-values (default behaviour).

Data

structure(list(county = c("Autauga_County", "Baldwin_County", 
"Catron_County"), state = c("AL", "AL", "AL"), n_state_1 = c(NA, 
"GA", "FL"), n_state_2 = c("FL", NA, "GA"), n_state_3 = c(NA, 
"TN", NA), n_state_4 = c(NA, NA, "CA")), problems = structure(list(
    row = 3L, col = "n_state_4", expected = "", actual = "embedded null", 
    file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(county = structure(list(), class = c("collector_character", 
    "collector")), state = structure(list(), class = c("collector_character", 
    "collector")), n_state_1 = structure(list(), class = c("collector_character", 
    "collector")), n_state_2 = structure(list(), class = c("collector_character", 
    "collector")), n_state_3 = structure(list(), class = c("collector_character", 
    "collector")), n_state_4 = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

It worked exactly how I wanted it:) but I don't quite follow the code even after looking into it. In particular, I don't understand what is going on in the second line (the pivot_longer function) and the last line. If you could provide some intuition, I'd be extremely grateful! — Jack Olive, Jul 04 '21 at 18:38
Added a brief explanation to clearify the code. Hope it helps. — Martin Gal, Jul 04 '21 at 18:53

score 2 · Answer 2 · answered Jul 04 '21 at 18:55

Or another option with dapply from collapse and select only columns with any non-NA elements

library(collapse)
library(dplyr)
dapply(df1, MARGIN = 1, FUN = function(x) c(x[!is.na(x)], x[is.na(x)])) %>%
   select(where(~ any(complete.cases(.))))
# A tibble: 3 x 5
  county         state n_state_1 n_state_2 n_state_3
  <chr>          <chr> <chr>     <chr>     <chr>    
1 Autauga_County AL    FL        <NA>      <NA>     
2 Baldwin_County AL    GA        TN        <NA>     
3 Catron_County  AL    FL        GA        CA

Moving elements from column to column in r

2 Answers2

Data