14

I have been playing around with list.files() and I wanted to only list 001.csv through 010.csv and I came up with this command:

list_files <- list.files(directory, pattern = ".*\\000|010", full.names = TRUE)

This code gives me what I want, but I do not fully understand what is happening with the pattern argument. How does pattern = .*\\\000 work?

varsha
  • 1,620
  • 1
  • 16
  • 29
Chris
  • 1,150
  • 3
  • 13
  • 29
  • 3
    How did you come up with that command? I'm shocked it worked. I would think something like `"00\\d\\.csv|010\\.csv"`, depending on how specific you need to be. – Gregor Thomas Jan 07 '15 at 21:23
  • I didn't think `"\\0"` was a valid regex escapement anyway, unless it's being treated as just `"0"` – Drew McGowen Jan 07 '15 at 21:26
  • 1
    Gregor, I started with variations of 00*, 0*0, *00, then read about .*\\ somewhere. I messed around with .*\\ without knowing what it really does. This just ended up working for me. – Chris Jan 07 '15 at 21:31
  • Related (re: `\\0`), but I'm still confused: http://stackoverflow.com/q/4654723/903061 – Gregor Thomas Jan 07 '15 at 22:00

1 Answers1

14

\\0 is a backreference that inserts the whole regex match to that point. Compare the following to see what that can mean:

sub("he", "", "hehello")
## [1] "hello"
sub("he\\0", "", "hehello")
## [1] "llo"

With strings like "001.csv" or "009.csv", what happens is that the .* matches zero characters, the \\0 repeats those zero characters one time, and the 00 matches the first two zeros in the string. Success!

This pattern won't match "100.csv" or "010.csv" because it can't find anything to match that is doubled and then immediately followed by two 0s. It will, though, match "1100.csv", because it matches 1, then doubles it, and then finds two 0s.

So, to recap, ".*\\000" matches any string beginning with xx00 where x stands for any substring of zero or more characters. That is, it matches anything repeated twice and then folllowed by two zeros.

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455