2

I established a connection to an internal Teradata DB.In order to let you know my issue i have provided an exampled below.

What I am trying to do is to filter columns by my_date and narrow it down by my_key.

But I would like to keep the missing results but somehow the filter drops them within my my_result. Again currently the below code drops them if not available. As mentioned within the comments @zx8754 I have tried to add is.na(my_date) to the filter but without success. I have also tried is.empty, is.null, complete instead of collect.

Any help would be much appreciated.

Reproducible test

# Create my_df
my_key <- c(1, 2, 3,4)
my_date <- as.Date(c('2018-07-01','2018-08-01','2018-09-01','2018-09-01'))
my_df <- data.frame(my_date, my_key)

# Filter_mydf
my_result <- my_df %>% 
  select (my_date,my_key)%>%
  filter(between(my_date ,as.Date("2018-08-01"),as.Date("2018-09-01")),
         my_key %in% c(1,2,3))%>%
  collect()

my_result

my_date my_key
2018-08-01 2
2018-09-01 3

Desired my_result

my_date my_key
NA 1
2018-08-01 2
2018-09-01 3
zx8754
  • 52,746
  • 12
  • 114
  • 209
LePopp
  • 61
  • 6
  • Try using is.na? `filter((between(col_date,20180701,20180930) & col_key %in% c(12345,54321)) | is.na(col_date))` – zx8754 Nov 08 '18 at 09:07
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Nov 08 '18 at 09:07
  • Related github post: https://github.com/tidyverse/dplyr/issues/3196 – zx8754 Nov 08 '18 at 09:12
  • @zx8754 Thanks but is.na(col_date) lead into the same result without the missing data. – LePopp Nov 09 '18 at 12:54
  • Provide reproducible data, please. So we can have the same problem as you. – zx8754 Nov 09 '18 at 13:03
  • @zx8754 Sorry, please find a reproducible example within my main question. – LePopp Nov 09 '18 at 13:27

1 Answers1

0

Set my_date to NA if date is not between dates, then filter on my_key:

# Create my_df
my_df <- data.frame(
  my_date = as.Date(c('2018-07-01','2018-08-01','2018-09-01','2018-09-01')),
  my_key = c(1, 2, 3, 4, 5))


library(dplyr)

my_df %>% 
  mutate(my_date = if_else(between(my_date, as.Date("2018-08-01"), as.Date("2018-09-01")) & 
                             my_key %in% c(1, 2, 3), my_date, as.Date(NA))) %>% 
  filter(my_key %in% c(1, 2, 3))

#      my_date my_key
# 1       <NA>      1
# 2 2018-08-01      2
# 3 2018-09-01      3
zx8754
  • 52,746
  • 12
  • 114
  • 209