0

I am very new to r and I have no experience with regular expressions and any help would be really appreciated.

I am reading in a dir and I am trying to find files with the number "22953" and then I want to read the newest file containing this. The date is also written in the files' name.

Files in the directory:

inv_22953_20190828023258_112140.csv
inv_22953_20190721171018_464152.csv
inv_8979_20190828024558_112140.csv

The problem that I have here is that I can't really depend on the place of the string to get the date because as you can see some files might have fewer characters that is why maybe a solution would be to locate the date between the 2nd and 3rd.

filepath <- "T:/Pricing/Workstreams/Business Management/EU/01_Operations/02_Carveouts/05_ImplementationTest/"

list.files(filepath)[which.max(suppressWarnings(ymd_hm(substr(list.files(filepath, pattern="_22953"),11,22))))]```
Matthew
  • 1,412
  • 2
  • 20
  • 35

1 Answers1

0
library(lubridate)

# First find the files with 22953 inside
myFiles <- grep("22953", list.files(filepath), value = T)

# Then, isolate the date and which file has the newest (maximum) date:

regex <- "^.*_.*_([0-9]{4})([0-9]{2})([0-9]{2}).*\\.csv$"

myFiles[which(as_date(sub(regex, "\\1-\\2-\\3", myFiles)) == max(as_date(sub(regex, "\\1-\\2-\\3", myFiles))))]

Explanation of the regular expression

  • ^ matches the beginning of a string (says "whatever comes next is the beginning")
  • .* matches anything 0+ times
  • _ matches an underscore
  • [0-9]{4} finds 4 numbers between 0 and 9
  • [0-9]{2} finds 2 numbers between 0 and 9
  • stuff between parentheses is captured for the replacement string
  • \\1 refers to first group in parentheses, \\2 the second, and \\3 the third
  • $ refers to the end of a string (says "the end of the string ends in .csv")
Brigadeiro
  • 2,649
  • 13
  • 30
  • great i have managed to got some inspiration from it, do you think there is a nicer way to write it? r <- grep("22953", list.files(filepath), value = T) r1 <- which.max(ymd_hms(substr(r,11,24))) r[r1] – Youssef Yassine Sep 08 '19 at 19:44
  • amazing!! so elegant and nice yet the regex is super complicated I could never understand it! But Sir, thank you so much – Youssef Yassine Sep 08 '19 at 19:57
  • Just added an explanation of the regex for you – Brigadeiro Sep 08 '19 at 20:04