0

I am stuck. How do I simultaneous us two relational operators as filters within group within group using dplyr's filter()

What I got

data,

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
tbl <- structure(list(id1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
           2L, 2L, 2L, 2L, 2L, 2L), id2 = c("x_02", "x_02", "x_02", "x_02", "x_02", "x_02",
           "x_02", "x_02", "x_02", "x_02", "x_02", "x_02", "x_02", "x_03", "x_03", "x_03",
           "x_03", "x_03", "x_03", "x_03", "x_03"), x = c(-4L, -3L, -2L, -1L, 1L, 2L, 3L,
           4L, 5L, -2L, -1L, 1L, 2L, -2L, -1L, 1L, 2L, 3L, 4L, 5L, 6L)),
           class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L),
           vars = c("id1", "id2"), drop = TRUE, .Names = c("id1", "id2", "x"),
           indices = list(0:8, 9:12, 13:20), group_sizes = c(9L, 4L, 8L),
           biggest_group_size = 9L, labels = structure(list(id1 = c(1L, 2L, 2L),
           id2 = c("x_02", "x_02", "x_03")), class = "data.frame", row.names = c(NA, -3L),
           vars = c("id1", "id2"), drop = TRUE, .Names = c("id1", "id2")))

tbl
#> # A tibble: 21 x 3
#> # Groups:   id1, id2 [3]
#>      id1   id2     x
#>    <int> <chr> <int>
#>  1     1  x_02    -4
#>  2     1  x_02    -3
#>  3     1  x_02    -2
#>  4     1  x_02    -1
#>  5     1  x_02     1
#>  6     1  x_02     2
#>  7     1  x_02     3
#>  8     1  x_02     4
#>  9     1  x_02     5
#> 10     2  x_02    -2
#> # ... with 11 more rows

In short I want to lookup within id1, within id2, and find series of xs that start at x < -2 and end at x > 2 (the desired outcome below mig illustrate it better than I describe it here).

In a way I see it as a combination of two filters, i.e., I want this filter,

library(dplyr)
tbl %>% group_by(id1, id2) %>%
   filter( (row_number() == n() & x >  2 ) )
#> # A tibble: 2 x 3
#> # Groups:   id1, id2 [2]
#>     id1   id2     x
#>   <int> <chr> <int>
#> 1     1  x_02     5
#> 2     2  x_03     6

to be combined with this filter,

tbl %>% group_by(id1, id2) %>%
   filter( (row_number() == 1 & x < -2 )  )
#> # A tibble: 1 x 3
#> # Groups:   id1, id2 [1]
#>     id1   id2     x
#>   <int> <chr> <int>
#> 1     1  x_02    -4

something like this, I thought, but that doesn't give me any data.

tbl %>% group_by(id1, id2) %>%
   filter( (row_number() == n() & x >  2 ) &
           (row_number() == 1   & x < -2 )  )

Why doesn't this give?

What I am trying to get / desired outcome

# A tibble: 2 x 3
# Groups:   id1, id2 [1]
    id1   id2     x
  <int> <chr> <int>
1     1  x_02    -4
1     1  x_02     5
Eric Fail
  • 8,191
  • 8
  • 72
  • 128
  • 1
    Can you explain why your conditions in the examples don't match the explanation you provide? You say for instance you want to lookup within id1 and id2 and find series of x that start at > -2 and end at < 2, but your condition involves a row number you haven't mentioned, and the opposite of your > -2 statement: filter( (row_number() == n() & x > 2 ) ). I can't see a relationship between the conditions you mention and your expected result at all. – Stewart Ross Jan 25 '18 at 06:22
  • @StewartRoss, thank you for your question. If I understand it correct the call `row_number() == n()` gives me the last, `n()`th, row, `row_number()` within, `group_by()`. In this case is `id1` and `id2`. Then I combine that with the condition, `&`, that this specific last row has to be greater then 2 in this row, i.e. `x > 2`. This is what I do in the first call after the data. The call where the outcome is `id11` 1 and 2 and `id2` is `x_02` and `x_03` with the values 5 and 6. I hope that answers you question? If not please ask again. Possible take a look at my _desired outcome_. Thanks. – Eric Fail Jan 25 '18 at 07:26
  • Your code asks to keep in each group each row that is the last row & is the first row & has x > 2 & has x < -2. (Of which there must be none.) But that's not what you said you wanted. – philipxy Jan 25 '18 at 10:16
  • I've updated my text (just below the data). You said _Of which there must be none_. I am not sure I agree. For case `id1 == 1` within group `id2 == x_02` there is a series of `x` values starting at `-4` and ending in `5`, that is exactly what I am looking to extract the first and last value of. That's what I try to show in my _desired outcome_. I think I might have missed somehing? Please feel free to suggest any edits you think would clarify the question. – Eric Fail Jan 25 '18 at 10:25
  • Have you tried `filter( (row_number() == n() & x > 2 ) | (row_number() == 1 & x < -2 ) )`? – CPak Jan 25 '18 at 13:31
  • @CPak, thank you for your comment. I did try that. It yields a combination of the two outputs I posted. I want to check simultaneously within `id1` and `id2` and than look at both the first row and the last row with some `abs(2)`. – Eric Fail Jan 25 '18 at 13:34
  • I am just telling you what your code says, not what you want. No row has x > 2 & x < -2. But that's moot. You also don't want only rows that among other things are both first & last, ie that are alone in their group, but that's what the code returns. You want something else. Although "find series" doesn't say it. It's not clear to me what exactly you want from your description & example. I can't connect row_number()=n() with series. It has nothing to do with where series start or end in general. Plus your output is about *endpoints* of series. PS Read re window functions & series in tables. – philipxy Jan 26 '18 at 07:31
  • What are you asking exactly? Why your example code returns nothing & what similar code would return your example output specifically for your example input, or what code would return what you actually want in the long run? These are two different questions. And you haven't been clear about either or which you're asking. Is the latter just background to trying to ask the former? PS Re your long run goal: You seem in your (unclear) question & comments to be confusing the notions of 1st & last rows of *series* with that of a *group*. – philipxy Jan 26 '18 at 07:41
  • Thanks for all the feedback. CPak provided a great answer that is consistent with my desired outcome. – Eric Fail Jan 26 '18 at 07:45

1 Answers1

2

When both conditions are met within a group, there should be 2 rows for that group, so just filter again n() == 2

tbl %>%
  filter((row_number() == n() & x > 2 ) | (row_number() == 1 & x < -2 )) %>%
  filter(n() == 2)

# A tibble: 2 x 3
# Groups: id1, id2 [1]
    # id1 id2       x
  # <int> <chr> <int>
# 1     1 x_02     -4
# 2     1 x_02      5

Try the additional (see comments):

tbl %>%
  filter((row_number() == n() & x > 2 ) | (row_number() == 1 & x < -2 )) %>%
  filter(n() == 2) %>%
  distinct(id1, id2) %>%
  left_join(., tbl, by=c("id1", "id2"))
CPak
  • 13,260
  • 3
  • 30
  • 48
  • Please put your answer in your answer, not in comments. – philipxy Jan 26 '18 at 07:25
  • 1
    Thank you. I'm surprised by the simplicity. In addition, thanks a lot for also responding to the additional part. I manged to get this solution `filter(tbl, any(x > 2) & any(x < -2))` [from markus](https://stackoverflow.com/a/48449322/1305688) that get's me the identical output. – Eric Fail Jan 26 '18 at 08:33