1

I want to create date object between 2008-01-01 and 2010-12-31 around 10K of them. I wrote the code for that but I actually want to keep days 1-366 in 2008 because of 2008-02-29 (leap year) I want them to restart after 366 then become 1 on 2009-01-01. I can do this as create only for 2008 then 2009 then 2010 but it won't be convenient. I was reading about lubridate but could not figure it out. I can also filter 1 to 366 then 367-731 but that's not gonna be efficient as well. Anyone knows a better way to do it?

    set.seed(123)
    tim1=sample(365*3+1,10000,replace = TRUE)   ### that plus 1 from feb 29 in 2008
    dat1=as.Date(tim1,origin="2007-12-31")   # then 1 will be 2008-01-01
iHermes
  • 314
  • 3
  • 12

1 Answers1

1

You can create a vector of all the target dates and sample from it. To create the vector, there is seq.Date, the seq method for objects of class "Date".

start <- as.Date("2008-01-01")
end <- as.Date("2010-12-31")
s <- seq(start, end, by = "days")

The vector s includes all days between start and end. Now sample from it.

set.seed(123)
dat1 <- sample(s, 10000, TRUE)

Transform the sample into day-of-the-year. See help("strptime")

as.numeric(format(dat1, format = "%j"))

In the end, remove s, it's no longer needed.

rm(s)    # tidy up

Edit.

The following two functions do what the question asks for but with two different methods.
f1 is the code above wrapped in a function, f2 uses ave/seq_along/match and is a bit more complicated. The tests show function f2 to be twice as fast than f1

f1 <- function(start_date, end_date, n){
  start <- as.Date(start_date)
  end <- as.Date(end_date)
  s <- seq(start, end, by = "days")
  y <- sample(s, n, replace = TRUE)
  as.numeric(format(y, format = "%j"))
}

f2 <- function(start_date, end_date, n){
  start <- as.Date(start_date)
  end <- as.Date(end_date)
  s <- seq(start, end, by = "days")
  y <- sample(s, n, replace = TRUE)
  z <- ave(as.integer(s), lubridate::year(s), FUN = seq_along)
  z[match(y, s)]
}

set.seed(123)
x1 <- f1("2008-01-01", "2010-12-31", 100)
set.seed(123)
x2 <- f2("2008-01-01", "2010-12-31", 100)

all.equal(x1, x2)
#[1] TRUE

Now the tests.

library(microbenchmark)

mb <- microbenchmark(
  f1 = f1("2008-01-01", "2010-12-31", 1e4),
  f2 = f2("2008-01-01", "2010-12-31", 1e4),
  times = 50
)
print(mb, order = "median")

ggplot2::autoplot(mb)

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Hi @RuiBarradas Thank you. I think I couldn't explain my problem. After 2008-12-31. I need the date start again but as number `set.seed(123) X1=sample(365*1+1,200,replace = TRUE) Y1=as.Date(X1,origin="2007-12-31") head(Y1) X2=sample(365*1,200,replace = TRUE) Y2=as.Date(X2,origin="2008-12-31") head(Y2)` I have rbind(X1,X2) I need rbind(Y1,Y2) but I don't want to do by hands like I did here. Since I have 11 years or more. I also don't know how many in each year ( I assumed 200 but it might be ) – iHermes Feb 07 '20 at 16:16
  • 1
    @iHermes Another edit, with another function and a speed test. – Rui Barradas Feb 07 '20 at 16:57
  • Thank you @ruibarradas. Out of curiosity, if we know 2008-01-01 was a tuesday and we want to create a column of week of day on that 'dat1' , how would we do that? – iHermes Feb 07 '20 at 17:11
  • 1
    Formats `"%U"` and %V"` give the week-of-the-year. The first as a number in 0-53, the latter in 1-53. See the same help page for these and all other formats accepted. – Rui Barradas Feb 07 '20 at 17:25
  • I was thinking as a week name but the link you provided helped me. `as.character(format(dat1, format ="%A"))` ` dat1 num1 dname1` `1 2009-06-09 160 Tuesday` `2 2011-12-11 345 Sunday` `3 2010-01-17 17 Sunday` `4 2012-06-01 153 Friday` `5 2012-09-14 258 Friday` `6 2008-03-24 84 Monday ` – iHermes Feb 07 '20 at 19:21
  • @iHermes Week names are locale dependend, maybe week number, `"%u"` or `"%w"`? – Rui Barradas Feb 07 '20 at 19:36