Purrr filter the nested data based on unnested variable containing character vectors

Question

I have the data similar to df3. To reproduce the data, run the following:

vec1 <- c("A", "B")
vec2 <- c("A", "B", "C")

df1 <- tibble::tribble(
          ~A, ~B,
          "X", 4L,
          "X", 9L,
          "Y", 5L,
          "Y", 2L,
          "Y", 8L,
          "Y", 2L) %>%
  group_by(A) %>% 
  nest()

df2 <- tibble::tribble(
  ~A, ~C,
  "X", vec1,
  "Y", vec2)

df3 <- df1 %>% left_join(df2, by = "A")

I need to filter the nested data using something like this:

df4 <- df3 %>% filter(when C==vec1, B (part of nested data now) < 5 
                      when C==vec2, B (part of nested data now) >4)

or may be like this:

df4 <- df3 %>% map(.$data, ~filter((identicle(.$C, vec1) & B < 5) | 
                                  identical(.$C, vec2) & B >4))

I just have df3 and I want df4. How should I do the above filtering using purrr to get the following desired df4 output.

df11 <- tibble::tribble(
  ~A, ~B,
  "X", 4L,
  "Y", 5L,
  "Y", 8L) %>%
  group_by(A) %>% 
  nest()

df4 <- df11 %>% left_join(df2, by = "A")

I'm not following how `B` is filtered. If `df3$data` consists of nested `B` columns, how are you checking if `B < 5` for example? If `any()` value in `B` is `< 5`? If you could show an expected `df4` result example that would be helpful. — thelatemail, May 01 '18 at 22:18
I want to filter B that is inside the nested dataframes with the variable name "data" created automatically by nest. I have updated the question to reflect the desired output. — Geet, May 01 '18 at 22:26
It's not very efficient, but you can `match` lists: `match(df3$C, list(vec1,vec2))` for instance, which will give you a flag for deciding what to do next. — thelatemail, May 01 '18 at 22:50
Can you suggest that in the form of df4 <- .....df3. It would be very helpful! — Geet, May 01 '18 at 22:59

score 3 · Accepted Answer · answered May 01 '18 at 22:56

3

Here is one option using map2 and identical for the condition check:

df3 %>% 
    mutate(
        data = map2(
            data, C, ~ if(identical(.y, vec1)) filter(.x, B < 5) else filter(.x, B > 4)
        )
    ) %>% 
    identical(df4)
# [1] TRUE

answered May 01 '18 at 22:56

Psidom

209,562
33
339
356

This worked. Can you help me do this with case_when? – Geet May 02 '18 at 00:47
1

I am not sure you can use `case_when` in this context, since you want to return a list and `case_when` requires a vector output. Hence the `if` and `else` – Calum You May 02 '18 at 01:08

Calum You · Answer 2 · 2018-05-01T23:06:29.960

Here's a different approach that uses unnest to work on the values of B directly, replacing the original vectors afterwards.

library(tidyverse)
vec1 <- c("A", "B")
vec2 <- c("A", "B", "C")

df3 <- structure(list(A = c("X", "Y"), data = list(structure(list(B = c(4L, 9L)), .Names = "B", row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame")), structure(list(B = c(5L, 2L, 8L, 2L)), .Names = "B", row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))), C = list(c("A", "B"), c("A", "B", "C"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), .Names = c("A", "data", "C"))

veclist <- list(vec1, vec2)
df3 %>%
  mutate(vec = match(C, veclist)) %>%
  unnest(data) %>%
  filter(vec == 1 & B < 5 | vec == 2 & B > 4) %>%
  nest(B) %>%
  mutate(C = map(vec, ~ veclist[[.]])) %>%
  as.data.frame()
#>   A vec data       C
#> 1 X   1    4    A, B
#> 2 Y   2 5, 8 A, B, C

Created on 2018-05-01 by the reprex package (v0.2.0).

score 1 · Answer 3 · answered May 02 '18 at 01:57

1

There is no need of if-else statements:

mine=df3%>%
   mutate(data=map2(data,match(C,list(vec1,vec2)),
                ~filter_(.x,c("B<=4","B>4")[.y])))
 identical(mine,df4)
[1] TRUE

answered May 02 '18 at 01:57

Onyambu

67,392
3
24
53

Purrr filter the nested data based on unnested variable containing character vectors

3 Answers3