0

I have two time stamp data A and B.

A = c("2015-11-02 08:30:00.054", "2015-11-02 08:30:00.060", "2015-11-02 08:30:00.060", "2015-11-02 08:30:00.062", "2015-11-02 08:30:00.952")

B = c("2015-11-02 08:30:00.016", "2015-11-02 08:30:00.029", "2015-11-02 08:30:00.030", "2015-11-02 08:30:00.045", "2015-11-02 08:30:00.048", "2015-11-02 08:30:00.054", "2015-11-02 08:30:00.056", "2015-11-02 08:30:00.078", "2015-11-02 08:30:00.079", "2015-11-02 08:30:00.079", "2015-11-02 08:30:00.246", "2015-11-02 08:30:00.247", "2015-11-02 08:30:00.251", "2015-11-02 08:30:00.251", "2015-11-02 08:30:00.252")

I will denote each time elements in vectors A and B as i and j. So in this case, $A \in {1,...,5}$ and $B \in {1,...,15}$. The desired output matrix, Z, is a i by j matrix, in this case, 5 by 15.

I want to check whether for each intervals for i and j, if the times intervals overlap and if they do record 1 in Z_{i,j}. So, if [A_i,A_{i+1}] and [B_j, B_{j+1}] time intervals overlap at all, I will record 1.

For example, for i=1 and j=1, ["2015-11-02 08:30:00.054", "2015-11-02 08:30:00.060"] and ["2015-11-02 08:30:00.016", "2015-11-02 08:30:00.029"] do not overlap, so the output for Z_{1,1} is 0.

What I tried now is for each i and j, I go through the whole vector and use as.interval and int_overlaps functions in the lubridate package. However, I would like a vectorized solution to this problem if possible. It is terribly slow as my A and B vectors often contain more than 10,000 variables.

I tried using the following to generate a price multiplied ii by jj matrix which is terribly inefficient:

r1 = as.interval( strptime(as.POSIXlt((A), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[ii], strptime(as.POSIXlt((A), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[ii+1])

r1 = as.interval( strptime(as.POSIXlt((B), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[jj], strptime(as.POSIXlt((B), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[jj+1])

int_overlaps(r1,r2)

To conclude, I have the following for now. It works, but it is terribly slow.

tch = function(time_vector){
strptime(as.POSIXlt(as.character(time_vector), format = "%Y-%m-%d %H:%M:%OS",
tzone = "CT"), format = "%Y-%m-%d %H:%M:%OS")}

Z = matrix(0, length(A)-1, length(B)-1)
for (ii in 1:(length(A)-1)){
  for (jj in 1:(length(B)-1)){
    r1 = as.interval(tch(A)[ii], tch(A)[ii+1])
    r2 = as.interval(tch(B)[jj], tch(B)[jj+1])
    # save all overlaps and compute all the vectors
    if( int_overlaps(r1,r2) ){
      Z[ii,jj] = 1
    } 
    if (jj > 1){
      if (Z[ii,jj] == 0 & Z[ii,(jj-1)] == 1){
        break
      }}
  }
  print(paste(jj,ii))}

Any help will be greatly appreciated!

jay2020
  • 451
  • 1
  • 3
  • 12
  • I'd suggest you make it a minimal reproducible example, people might find it hard to see what's going on. Perhaps `foverlaps` from the `data.table` package is already what you need - there are plenty of examples on how to find overlaps between date-time-values on SO. – lukeA Feb 17 '16 at 05:59
  • Thanks lukeA, I updated the question. – jay2020 Feb 17 '16 at 23:23
  • The question needs to be simplified. No one has the time to go through your code. – Lazarus Thurston Jul 02 '17 at 20:17

0 Answers0