0

I am trying to create a pairwise DTW (Dynamic Time Warping) matrix in python. I have the below code already, but it is incorrect somehow. My current output is a matrix full of infinity, which is incorrect. I cannot figure out what I am doing incorrectly.

def calc_pairwise_dtw_cost(x, y):
    
    cost_matrix = np.zeros((len(y), len(x)))
    dtw_cost = None


    for i in range(0, len(y)):
        for j in range(0, len(x)):
            cost_matrix[i, j] = float('inf')
        

    for i in range(1, len(y)):
        for j in range(1, len(x)):
            dtw_cost = cost_matrix[-1,-1]
            cost_matrix[i, j] = dtw_cost + min(
                cost_matrix[i-1, j],    # insertion
                cost_matrix[i, j-1],    # deletion
                cost_matrix[i-1, j-1])   # match

    return cost_matrix

Current output is:

array([[inf, inf, inf, ..., inf, inf, inf],
       [inf, inf, inf, ..., inf, inf, inf],
       [inf, inf, inf, ..., inf, inf, inf],
       ...,
       [inf, inf, inf, ..., inf, inf, inf],
       [inf, inf, inf, ..., inf, inf, inf],
       [inf, inf, inf, ..., inf, inf, inf]])
martineau
  • 119,623
  • 25
  • 170
  • 301
Blueboots
  • 49
  • 1
  • 9

2 Answers2

1

Under IEEE 754 rules, inf + anything is inf. Since you filled your matrix with infinities, then read a value out of it and added another, you can't help but get an infinite result.

Drew Hall
  • 28,429
  • 12
  • 61
  • 81
  • Ah, that makes sense then! However I am referring to the linked pseudo-code below. I think I am misunderstanding of what happens when I get to the line where infinity is needed, or at least referenced. Source: https://en.wikipedia.org/wiki/Dynamic_time_warping – Blueboots Oct 08 '21 at 22:43
1

In your implementation there are several errors including:

  • neglecting to make use of inputs arrays x, y
  • cost_matrix is not declared with the right dimensions
  • cost_matrix[0,0] should be initialize to 0

Dynamic Time Warping: Explanation and Code Implementation provides an implementation of DTW that follows the Wikipedia pseudo-code closely as follows.

Code

def dtw(s, t):
    n, m = len(s), len(t)
    dtw_matrix = np.zeros((n+1, m+1))  # note the +1 added to n & m
    for i in range(n+1):
        for j in range(m+1):
            dtw_matrix[i, j] = np.inf
    dtw_matrix[0, 0] = 0               # [0,0] element set to 0
    
    for i in range(1, n+1):
        for j in range(1, m+1):
            cost = abs(s[i-1] - t[j-1]) # use inputs s & t
            # take last min from a square box
            last_min = np.min([dtw_matrix[i-1, j], dtw_matrix[i, j-1], dtw_matrix[i-1, j-1]])
            dtw_matrix[i, j] = cost + last_min
    return dtw_matrix
DarrylG
  • 16,732
  • 2
  • 17
  • 23
  • How can we return just the cost? I tried returning `cost` and I get a float but it is incorrect. – Hefe Feb 12 '22 at 01:01
  • @Julien -- can you give an example where the value of cost is incorrect i.e. `return cost` ? – DarrylG Feb 12 '22 at 06:21
  • US Brazil 2020-01-23 0.0 0.0 2020-01-24 1.0 0.0 2020-01-25 0.0 0.0 2020-01-26 3.0 0.0 2020-01-27 0.0 0.0 ... ... ... 2020-08-17 35112.0 19373.0 2020-08-18 44091.0 47784.0 2020-08-19 47408.0 49298.0 2020-08-20 44023.0 45323.0 2020-08-21 48693.0 30355.0 Here is an example. When I load in those two series for as `(s,t)`, the cost I get out is `18338.0`, which according to my teacher is the incorrect answer. – Hefe Feb 12 '22 at 19:08
  • I'm sorry, that's super messy. I'm not sure how to comment two long series in the correct format here. – Hefe Feb 12 '22 at 19:09
  • @Julien -- DTW is for two time series. Instead, you have dates intermixed with values which is a bit different. – DarrylG Feb 12 '22 at 19:19
  • I got you, I'm sorry I think the copy/paste mixed it up. In the actual dataframe, all those dates are actually the date-time index, and I'm reading in to `DTW` just the values for the `US` and `Brazil` series, which are actually two time series corresponding to daily new cases of COVID in each country from 01/23/2020 to 08/21/2020. So that should work right? – Hefe Feb 12 '22 at 19:24
  • @Julien -- you should add the input data to your question. – DarrylG Feb 12 '22 at 19:49
  • Sorry, I would but it's not my question--I can write up one though if you'd like. – Hefe Feb 12 '22 at 20:04
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/241957/discussion-between-darrylg-and-julien). – DarrylG Feb 12 '22 at 20:19