0

There is a char format column in my dataset named start_station_name with some missing values, so I'm trying to remove all rows having a blank / NA value.

However I know that there are more than 7000 rows with no value for the column start_station_name, when I try to remove blank rows, R can't find them and only removes 50 rows:

SD_cleaned <- drop_na(SD) 

Here is a sample of the dataset:

ride_id bike_type start_station_name
273C6C2B99EBAC32 electric_bike
7AB7965997435172 electric_bike Rush St & Superior St
D6C2BC6711446FB5 electric_bike
C2433C9CF5941BBF electric_bike Rush St & Superior St
... ... ...

I've also tried wit na.omit() or is.na(), but I had the same result.

Thanks for any feedback ️

Sydibyd
  • 1
  • 2
  • `NA` values are different from blanks `""`. Functions like `is.na`, `na.omit`, and `drop_na` handle `NA` values, not blanks. – Gregor Thomas Aug 17 '22 at 15:07
  • Welcome to SO. In R, an empty character string is not the same as `NA` or a character string that consists of any mumber of whitespace characters. To help you solve the problem, we really need to see a sample of your data as produced by `dput()`. That's the only way we can be sure *exactly* what sort of "missing" character values you are dealing with. – Limey Aug 17 '22 at 15:09

1 Answers1

0

drop_na only removes the NA rows. If there are blanks ("") convert the blanks to NA before doing the drop_na

library(dplyr)
SD_cleaned <- SD %>%
   na_if("") %>%
   drop_na()

If there are spaces as well, use trimws on each of the columns before converting the blank to NA

SD_cleaned <- SD %>%
     mutate(across(where(is.character), trimws)) %>%
     na_if("") %>%
     drop_na()
akrun
  • 874,273
  • 37
  • 540
  • 662