1

its pretty hard to find a title for my question because its very specific.

My problem is: I have around 9000 files of data collected over different periods. The filenames contain that periods and I only want to load that files into R, that contain at least 17/18 years of data collection.

I created a testlist to show what I mean:

list = c("AT0ACH10000700100dymax.1-1-1993.31-12-2003",
         "AT0ILL10000700500dymax.1-1-1990.31-12-2011", 
         "AT0PIL10000700500dymax.1-1-1992.31-12-2011",
         "AT0SON10000700100dymax.1-1-1990.31-12-2011",
         "AT0STO10000700100dymax.1-1-1992.31-12-2006",  
         "AT0VOR10000700500dymax.1-1-1991.31-12-2011",
         "AT110020000700100dymax.1-1-1993.31-12-2008",
         "AT2HE190000700100dymax.1-1-1993.31-12-2000", 
         "AT2KA110000700500dymax.1-1-1991.31-12-2010", 
         "AT2KA410000700500dymax.1-1-1991.31-12-2011")

These are the filenames. And now I want to extract all filenames that contain measurements that are at least 18 years long. For example the 1st file should be taken out because the periode is too short, the 2nd one is fine. So I have to create something that either compares the dates (only the years) or something like startyear + 18.

Oh and the file names dont have the same length! This is only an example.

I have no clue how to do that. Can somebody please help?

Essi
  • 761
  • 3
  • 12
  • 22

2 Answers2

3

Assuming the dates are always separated by ".", you can use string split. Here's an example getting the time difference in days.

split_list = strsplit(list, split=".", fixed=TRUE)

from = unlist(lapply(split_list, "[[", 2))
to = unlist(lapply(split_list, "[[", 3))
from = as.POSIXct(from, format="%d-%m-%Y")
to = as.POSIXct(to, format="%d-%m-%Y")

difftime(to, from, "days")

To get the time difference in years, there's a few different solutions you can use. Here's two solutions:

R: How to calculate the difference in years between a date and a year

R get date difference in years (floating point)

Kelli-Jean
  • 1,417
  • 11
  • 17
  • Thank you! When I understand it correctly I get a list with the difference in days. I can use an index list and apply it on my original list to get the correct files. Is that right? – Essi Nov 27 '17 at 23:41
  • Yes, that's exactly right. For example, if you wanted the difference to be more than 6575 days (~18 years * 365.25 days/year), you could find the index for which this is true: `idx = difftime(to, from, "days") > 6575`. And then subset the list: `list[idx]` – Kelli-Jean Nov 27 '17 at 23:53
  • Thank you! I got it! :)) – Essi Nov 28 '17 at 00:05
1

Alternative solution with some assumptions but getting cleanly at the desired output.

year_to   <- as.integer(sub(".*([0-9]{4}$)",      "\\1", list))
year_from <- as.integer(sub(".*-([0-9]{4})\\..*", "\\1", list))

# Assume all "from" dates start on Jan 01 and "to" dates end Dec 31
# Then the difference is 
diff <- year_to - year_from + 1
diff >= 18
FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE
s_baldur
  • 29,441
  • 4
  • 36
  • 69