0

When trying to combine multiple character columns using unite from dplyr, the na.rm = TRUE option does not remove NA.

Step by step:

  1. Original dataset has 5 columns word1:word5 Image of the original data
  2. Looking to combine word1:word5 in a single column using code:
    data_unite_5 <-  data_original_5 %>%
        unite("pentawords", word1:word5, sep=" ", na.rm=TRUE, remove=FALSE)
  1. I've tried using mutate_if(is.factor, as.character) but that did not work.

Any suggestions would be appreciated.

smci
  • 32,567
  • 20
  • 113
  • 146
nsmd
  • 1
  • 1
  • Couldnt figure out why na.rm did not work as intended with the unite function. – nsmd Sep 09 '20 at 18:46
  • What did you expect? `unite` replaces the NAs by an empty string `""`. Hence, if there are only NAs the result is an empty string, i.e. `""`. – stefan Sep 09 '20 at 20:57
  • Never describe a problem as *"that did not work"*, describe it as e.g. *"I expected the rows where any of these five columns contain NAs to get dropped, but they didn't"* – smci Sep 09 '20 at 21:25

1 Answers1

0

You have misinterpreted how the na.rm argument works for unite. Following the examples on the tidyverse page here, z is the unite of x and y.

With na.rm = FALSE

#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 a_b   a     b    
#> 2 a_NA  a     NA   
#> 3 NA_b  NA    b    
#> 4 NA_NA NA    NA   

With na.rm = TRUE

#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 "a_b" a     b    
#> 2 "a"   a     NA   
#> 3 "b"   NA    b    
#> 4 ""    NA    NA  

Hence na.rm determines how NA values appear in the assembled strings (pentrawords) it does not drop rows from the data.

If you were wanting to remove the fourth row of the dataset, I would recommend filter.

data_unite_5 <- data_original_5 %>%
  unite("pentawords", word1:word5, sep =" " , na.rm = TRUE, remove = FALSE) %>%
  filter(pentawords != "")

Which will exclude from your output all empty strings.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41