1

I am looking to find out which pattern is matched in a list of file names. I can find out if a match was found, but not which pattern was matched.

  local_pattern <- data.frame(
    condition = c("umhfl","dmhfl", "umhfr", "dmhfr", "shfr", "vshfr", "vshfl"),
    filename = c("*Upward motion*HF*Left*.csv", "*Downward motion*HF*Left*.csv", "*UHFR*.csv", "*DHFR*.csv", "*SHFR*.csv", "*VSHFR*.csv", "*VSHFL*.csv")
  )

## matching a sample file name
pattern_matched <- grep(paste(glob2rx(local_pattern[,2]), collapse = "|"), "./csv files/DHFR 2019-04-09 04.59 PM_001.csv", value = F)

What I would like to see is the pattern that was matched rather than simply a TRUE that a match was found.

Levi
  • 301
  • 3
  • 12
  • Why not change `value=F` to `value=TRUE`. – IRTFM Jun 09 '19 at 20:54
  • That still doesn't tell me which pattern was found in a particular file name. – Levi Jun 09 '19 at 21:16
  • 2
    Did you already have a look here https://stackoverflow.com/questions/9537797/r-grep-match-one-string-against-multiple-patterns – Sven Jun 09 '19 at 21:22
  • Indeed, possible duplicate of the first answer over there - https://stackoverflow.com/a/9538033/496803 - which returns a TRUE/FALSE matrix for every possibility. – thelatemail Jun 09 '19 at 23:33
  • @Sven, similar but not the same. The match pattern has wildcards here but the solution does not work here. – Levi Jun 10 '19 at 05:57
  • Instead of using logical-OR (regex-`"|"`) you probably need to use `sapply` on a vector of separate patterns. – IRTFM Jun 10 '19 at 15:39

2 Answers2

1

We may use str_detect from stringr that is vectorised over string and pattern:

library(stringr)
str_detect("./csv files/DHFR 2019-04-09 04.59 PM_001.csv",
           glob2rx(local_pattern[, 2]))
# [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

so that

local_pattern[str_detect("./csv files/DHFR 2019-04-09 04.59 PM_001.csv", glob2rx(local_pattern[,2])), 2]
# [1] *DHFR*.csv
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • How do I know that the DHFR pattern was matched rather than one of the others? – Levi Jun 09 '19 at 21:18
  • This works great, thank you! Concise and outputs just the right code. The only modification I made was to output column 1 from local_pattern. – Levi Jun 10 '19 at 19:58
0

If there are fewer than 10 patterns to match, you can collect information with PERL style captures in base R. First build a pattern with (...) to capture.

pat <- paste(glob2rx(local_pattern$filename), collapse = ")|(")
pat <- paste0("(", pat, ")")

Assign the text string to variable for now:

x <- "./csv files/DHFR 2019-04-09 04.59 PM_001.csv"

Use regexpr or with more effort, gregexpr, for multiple matches:

m <- regexpr(pat, x, perl = TRUE)
sel <- which(attr(m, "capture.length") > 0)

Collect the desired information:

local_pattern[sel,]
>   condition   filename
> 4     dmhfr *DHFR*.csv

regmatches(x, m)
> [1] "./csv files/DHFR 2019-04-09 04.59 PM_001.csv"
David O
  • 803
  • 4
  • 10