3

I have a dataframe that looks like this (but for every US county)

county state n_state_1 n_state_2 n_state_3 n_state_4
Autauga County AL NA FL NA NA
Baldwin County AL GA NA TN NA
Catron County AL FL GA NA CA

I want to move the non-missing values (FL,GA,TN etc.) to the first columns starting from n_state_1 and then delete the columns containing only missing values to get:

county state n_state_1 n_state_2 n_state_3
Autauga County AL FL NA NA
Baldwin County AL GA TN NA
Catron County AL FL GA CA

I am struggling with the first step. I thought about using the function distinct but it doesn't work because there are non-empty elements in each column.

Jack Olive
  • 33
  • 3
  • Related question: https://stackoverflow.com/questions/68245914/how-to-remove-rows-that-have-repeated-elements/68246062#68246062 – Martin Gal Jul 04 '21 at 18:03
  • [Shifting non-NA cells to the left](https://stackoverflow.com/questions/23285215/shifting-non-na-cells-to-the-left) and Linked therein. – Henrik Jul 04 '21 at 18:21

2 Answers2

3

You could use dplyr and tidyr:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(starts_with("n_state")) %>% 
  drop_na() %>%
  group_by(county, state) %>% 
  mutate(name=row_number()) %>% 
  pivot_wider(names_prefix="n_state_")

which returns

  county         state n_state_1 n_state_2 n_state_3
  <chr>          <chr> <chr>     <chr>     <chr>    
1 Autauga_County AL    FL        NA        NA       
2 Baldwin_County AL    GA        TN        NA       
3 Catron_County  AL    FL        GA        CA  

What happened here?

  • pivot_longer takes the n_state_{n}-columns and collapses them into two columns: the name-column contains the original column name (n_state_1, n_state_2 etc), the value-column contains the states (FL, GA or <NA> in many cases).
  • Next we remove every <NA> entry. (Note: I use <NA> to make clear it's an NA-value).)
  • After a grouping by county and state we add a rownumber. These numbers will be later used to create the new column names.
  • pivot_wider now takes these row numbers and prefixes them with n_state_ to get the new columns. The values are taken from the value-column created in the second line of code. pivot_wider fills the missing values with <NA>-values (default behaviour).

Data

structure(list(county = c("Autauga_County", "Baldwin_County", 
"Catron_County"), state = c("AL", "AL", "AL"), n_state_1 = c(NA, 
"GA", "FL"), n_state_2 = c("FL", NA, "GA"), n_state_3 = c(NA, 
"TN", NA), n_state_4 = c(NA, NA, "CA")), problems = structure(list(
    row = 3L, col = "n_state_4", expected = "", actual = "embedded null", 
    file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(county = structure(list(), class = c("collector_character", 
    "collector")), state = structure(list(), class = c("collector_character", 
    "collector")), n_state_1 = structure(list(), class = c("collector_character", 
    "collector")), n_state_2 = structure(list(), class = c("collector_character", 
    "collector")), n_state_3 = structure(list(), class = c("collector_character", 
    "collector")), n_state_4 = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • It worked exactly how I wanted it:) but I don't quite follow the code even after looking into it. In particular, I don't understand what is going on in the second line (the pivot_longer function) and the last line. If you could provide some intuition, I'd be extremely grateful! – Jack Olive Jul 04 '21 at 18:38
  • Added a brief explanation to clearify the code. Hope it helps. – Martin Gal Jul 04 '21 at 18:53
2

Or another option with dapply from collapse and select only columns with any non-NA elements

library(collapse)
library(dplyr)
dapply(df1, MARGIN = 1, FUN = function(x) c(x[!is.na(x)], x[is.na(x)])) %>%
   select(where(~ any(complete.cases(.))))
# A tibble: 3 x 5
  county         state n_state_1 n_state_2 n_state_3
  <chr>          <chr> <chr>     <chr>     <chr>    
1 Autauga_County AL    FL        <NA>      <NA>     
2 Baldwin_County AL    GA        TN        <NA>     
3 Catron_County  AL    FL        GA        CA       
akrun
  • 874,273
  • 37
  • 540
  • 662