1

This could be a silly thing, but I have a data.frame and make a filter and I don't have the same results using a variable a constant with dplyr::filter or base subsetting, first an example

tt <- data.frame( t = runif(100,max=100)) %>% mutate(period =trunc( (t+3) / 12))
i <- 0
tt %>% filter(period==0)
tt %>% filter(period==i)
tt[tt$period == i,]

and the results are equivalent

> tt %>% filter(period==0)
         t period
1 4.047352      0
2 2.391890      0
3 6.050928      0
4 1.646503      0
5 2.335137      0
> tt %>% filter(period==i)
         t period
1 4.047352      0
2 2.391890      0
3 6.050928      0
4 1.646503      0
5 2.335137      0
> tt[tt$period == i,]
          t period
23 4.047352      0
47 2.391890      0
75 6.050928      0
93 1.646503      0
95 2.335137      0

then the real (big) data.frame I made the same operations and did not get equivalent results

patch_sparse <- patch_sparse %>% mutate(period = trunc( (t+3) / 12))
str(patch_sparse)

'data.frame':   768307 obs. of  7 variables:
 $ t     : num  1 1 1 1 1 1 1 1 1 1 ...
 $ i     : int  2864 2864 2864 2864 2876 2876 2875 2876 2875 2857 ...
 $ j     : int  3109 3110 3111 3112 3112 3113 3114 3114 3115 3116 ...
 $ data  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ date  : chr  "2000-11-01" "2000-11-01" "2000-11-01" "2000-11-01" ...
 $ region: chr  "Australia" "Australia" "Australia" "Australia" ...
 $ period: num  0 0 0 0 0 0 0 0 0 0 ...

#
i <- 0
patch_sparse %>% filter(period==0) 
patch_sparse %>% filter(period==i) 
patch_sparse[patch_sparse$period == i,]

And the result are:

> patch_sparse %>% filter(period==0) 
    t    i    j data       date    region period
1   1 2864 3109 TRUE 2000-11-01 Australia      0
2   1 2864 3110 TRUE 2000-11-01 Australia      0
3   1 2864 3111 TRUE 2000-11-01 Australia      0
...
142 2 3457 1524 TRUE 2000-12-01 Australia      0
 [ reached 'max' / getOption("max.print") -- omitted 2346 rows ]

> patch_sparse %>% filter(period==i) 
[1] t      i      j      data   date   region period
<0 rows> (or 0-length row.names)

> patch_sparse[patch_sparse$period == i,]
    t    i    j data       date    region period
1   1 2864 3109 TRUE 2000-11-01 Australia      0
2   1 2864 3110 TRUE 2000-11-01 Australia      0
3   1 2864 3111 TRUE 2000-11-01 Australia      0
..
142 2 3457 1524 TRUE 2000-12-01 Australia      0
 [ reached 'max' / getOption("max.print") -- omitted 2346 rows ]

I tried to change the data.frame to tibble or to change trunc() to as.integer() with similar results, and I can't get a reproducible example. Any ideas?

Leosar
  • 2,010
  • 4
  • 21
  • 32

1 Answers1

2

The problem is that your data contains a column i. And in tidyverse pipes, the functions will always look within the data first, so what you essentially trying to do with patch_sparse %>% filter(period==i) is to filter on rows where period is equal to the column i of your data.

So if you want to filter based on an external scalar, make sure the name of the scalar is different from your data's column names, e.g. something like:

filter_i <- 0
patch_sparse %>% filter(period==filter_i)
deschen
  • 10,012
  • 3
  • 27
  • 50