1

Here's the reprex we'll need to create in our working directory:

library(tidyverse)
library(openxlsx)
library(readxl)
write.xlsx(list(iris), "AA-excel-file.xlsx")
write.xlsx(list(iris), "BB-excel-file.xlsx")
write.xlsx(list(iris), "CC-excel-file.xlsx")
write.xlsx(list(iris), "DD-excel-file.xlsx")
write.xlsx(list(iris), "EE-excel-file.xlsx")

And my working directory looks something like this:

C:
├── my-R-working-directory/
    ├── AA-excel-file.xlsx
    ├── BB-excel-file.xlsx
    ├── CC-excel-file.xlsx
    ├── DD-excel-file.xlsx
    └── EE-excel-file.xlsx

I've crafted a regular expression (demo here) that "selects" any file that does not begin with either AA or BB:

^(?!AA|BB)\w+$

I want to use this regex with base R list.files() to list all files that do not begin with either AA or BB. Here's my attempt:

list.files("path/of/folder", pattern = "\\^(?!AA|BB)\w+$.xlsx$", full.names = TRUE)
#> Error: '\w' is an unrecognized escape in character string starting ""\\^(?!AA|BB)\w"
#> Error: unexpected ')' in "           full.names = TRUE)"

I think my pattern argument is slightly off. This similar command does work fine, but doesn't exclude the AA and BB files:

list.files("path/of/folder", pattern = "\\.xlsx$", full.names = TRUE)

How do I properly write the pattern argument to exclude any files that start with AA or BB? And if you have the capability can you correct my regular expression? The regex only seems to work with "letters or numbers" characters. Any white space, dashes, dots, etc. break the regex (see demo).

Display name
  • 4,153
  • 5
  • 27
  • 75
  • Couldn't you just do `grep("^(AA|BB).*", list.files("path/of/folder", pattern = "\\.xlsx$"), invert = TRUE, value = TRUE)` ? – Allan Cameron Sep 11 '20 at 17:42
  • @Allan Cameron that almost works. There seems to be two issues _(1)_ I only want to include `*.xlsx` files and this doesn't seem to accomplish that and _(2)_ it also does not exclude directories (folders). – Display name Sep 11 '20 at 17:48
  • 1
    Jason - see my formal answer. This is run in my R home directory, which has multiple other files and directories. It's maybe modified a bit from my original suggestion. – Allan Cameron Sep 11 '20 at 17:50
  • @Allan Cameron Yup, your formal answer appears to have solved it. Thanks. I'll give the customary 24 hour period to let others a chance to solve as well. – Display name Sep 11 '20 at 17:55
  • I'm pleased that worked Jason. I'd be interested to see if there's a solution just using `pattern` – Allan Cameron Sep 11 '20 at 17:58

1 Answers1

1

You could use pattern to get all xlsx files then inverse grep those starting with AA or BB:

library(tidyverse)
library(openxlsx)
library(readxl)

write.xlsx(list(iris), "AA-excel-file.xlsx")
write.xlsx(list(iris), "BB-excel-file.xlsx")
write.xlsx(list(iris), "CC-excel-file.xlsx")
write.xlsx(list(iris), "DD-excel-file.xlsx")
write.xlsx(list(iris), "EE-excel-file.xlsx")

grep("^(AA|BB).*", list.files(pattern = "\\.xlsx$"), invert = TRUE, value = TRUE)
#> [1] "CC-excel-file.xlsx" "DD-excel-file.xlsx" "EE-excel-file.xlsx"
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87