0

I am trying to make a simple line of code to detect where there are incorrect entries in a dataframe. Consider the following example:

author   val1   val2   val3   val4
A         1      B      1      NA
A         NA     NA     NA     NA
NA        2      B      NA     B
NA        NA     NA     NA     B
NA        NA     NA     NA     NA
A         2      A      NA     B

A row always needs to have the author filled in, but this is sometimes forgotten. Also, sometimes row 2 has the author filled in, but by accident the rest of the data is entered on row 3.

What i want is to filter for rows that have NA for author and after that filter for any data entrie in whatever column. So my expected output for the above example would be:

author   val1   val2   val3   val4
NA        2      B      NA     B
NA        NA     NA     NA     B

Filtering for the rows with NA for author is easy, but i cant figure out what to do next. My code so far:

 df %>%
  filter(
    is.na(author)
    ) %>%  
  filter(
    across(
      .cols = everything(),
      .fns = ~ !is.na(.x)
    )
  ) 

I have the feeling i am pretty close, but after a few hours of trying and looking on stack my code still returns empty dataframes to me. I would prefer a solution in tidyverse syntax, but any help is much appreciated.

Stevestingray
  • 399
  • 2
  • 12

1 Answers1

1

My code is not very efficient but it seems to work.

library(stringr)
library(rebus)
library(tidyverse)
library(magrittr)

df <- tibble(author = c('A', 'A', NA, NA, NA, 'A'),
             val1   = c(1, NA, 2, NA, NA, 2),
             val2   = c('B', NA, 'B', NA, NA, 'A'),
             val3   = c(1, NA, NA, NA, NA , NA),
             val4   = c(NA, NA, 'B', 'B', NA, 'B'))

df_na <- filter(df, is.na(author)) 

#map and str_which will cover each column

index <- map(df_na,~ str_which(.x, pattern = rebus::or(ANY_CHAR, DGT))) %>% 
    keep(~ length(.x) != 0) %>% #filter any columns that are all NA
    unlist() %>%
    unique()

df_na %>% extract(index, )
jpdugo17
  • 6,816
  • 2
  • 11
  • 23
  • Thanks for the suggestion! When using the last line, with the extract function, i get this error returned: Error: Must extract column with a single valid subscript. x Subscript `var` has size 2 but must be size 1. I understand only half of the code, so it is difficult for me to tackle the error unfortunately. – Stevestingray May 21 '21 at 19:35
  • The last line es equivalent to: df_na[index, ] – jpdugo17 May 21 '21 at 20:36
  • Thanks, i got it working! My own dataset was somehwat more complex so the code needed to be adjusted. Works perfectly now, thanks a lot! – Stevestingray May 23 '21 at 14:54