2

In extracting information from a pdf using tabulizer and pdftools, I sometimes would like to index a large list of df based on a regex pattern match.

a <- data.frame(yes=c("pension"))
b <- data.frame(no=c("other"))
my_list <- list(a,b)

I would like to use str_detect to return an index of underlying df matching the pattern "pension".

The desired output would be:

index <- 1 (based on which and str_detect)
new_list <- my_list[[index]]
new_list
     yes
1 pension

How to detect the pattern in the underlying df and then return the index using which has been a struggle. I see previous discussions using loops and if-then statements, but a solution using purrr seems preferred.

zx8754
  • 52,746
  • 12
  • 114
  • 209
David Lucey
  • 252
  • 3
  • 9

1 Answers1

2

We may use

getIdx <- function(pattern, l)
  l %>% map_lgl(~ any(unlist(map(.x, grepl, pattern = pattern))))

getIdx("pension", my_list)
# [1]  TRUE FALSE

my_list[getIdx("pension", my_list)]
# [[1]]
#       yes
# 1 pension

This allows for multiple matching data frames. (No need for which really.)

In getIdx we go over data frames of l, then in a given data frame we go over its columns and use grepl. If there is a match in any of the columns, TRUE is returned for the corresponding data frame.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • I'm very sorry but still a bit new and your answer works perfectly, but my question was wrong. What I really was hoping for was the solution for this: a <- data.frame(yes=c("pension"),yes1="other") b <- data.frame(no=c("other"),no1="other") my_list=list(a,b) – David Lucey Dec 23 '18 at 16:51
  • @DavidLucey, as I understand the only difference in your new example is multiple columns. If you still want to find "pension" anywhere in a data frame then my answer was indeed about such cases too. So, I believe everything's fine, right? – Julius Vainora Dec 23 '18 at 17:24
  • I agree, something still not working for me, but pretty sure I can get there with your solution. Many thanks! – David Lucey Dec 23 '18 at 18:30
  • @DavidLucey, no problem. What error do you get? Perhaps I'll have an idea what's not working. – Julius Vainora Dec 23 '18 at 18:35
  • 1
    Think I have it. My object was actually list(list(df) so added recursive=TRUE to your getIdx function which turned out a logical vector, and then my_list[which(getIdx("pension", my_list))] worked. – David Lucey Dec 23 '18 at 19:28