2

NOTE: The post looks longer than it ought to because of docstrings and an array consisting of 40 datetimes.

I have some time-series data. For examples sake, let's say I have three parameters, each consisting of 40 data points: datetimes (given by dts), speed (given by vobs), and elapsed hour (given by els), which are combined by key into a dictionary data_dict.

dts = np.array(['2006/01/01 02:30:04', '2006/01/01 03:30:04', '2006/01/01 03:54:04'
 ,'2006/01/01 05:30:04', '2006/01/01 06:30:04', '2006/01/01 07:30:04'
 ,'2006/01/01 08:30:04', '2006/01/01 09:30:04', '2006/01/01 10:30:04'
 ,'2006/01/01 11:30:04', '2006/01/01 12:30:04', '2006/01/01 13:30:04'
 ,'2006/01/01 14:30:04', '2006/01/01 15:30:04', '2006/01/01 16:30:04'
 ,'2006/01/01 17:30:04', '2006/01/01 18:30:04', '2006/01/01 19:30:04'
 ,'2006/01/01 20:30:04', '2006/01/01 21:30:04', '2006/01/01 21:54:05'
 ,'2006/01/01 23:30:04', '2006/01/02 00:30:04', '2006/01/02 01:30:04'
 ,'2006/01/02 02:30:04', '2006/01/02 03:30:04', '2006/01/02 04:30:04'
 ,'2006/01/02 05:30:04', '2006/01/02 06:30:04', '2006/01/02 07:30:04'
 ,'2006/01/02 08:30:04', '2006/01/02 09:30:04', '2006/01/02 10:30:04'
 ,'2006/01/02 11:30:04', '2006/01/02 12:30:04', '2006/01/02 13:30:04'
 ,'2006/01/02 14:30:04', '2006/01/02 15:30:04', '2006/01/02 16:30:04'
 ,'2006/01/02 17:30:04'])

vobs = np.array([158, 1, 496, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
    , 1, 1, 823, 1, 1, 1, 1, 303, 1, 1, 1, 1, 253, 1, 1, 1, 408, 1
    , 1, 1, 1, 321])

els = np.array([i for i in range(len(vobs))])

data_dictionary = {'datetime' : dts, 'values' : vobs, 'elapsed' : els}

I have a function that takes a dictionary as an input and outputs a single scalar value of type <float> or type <int>. The function given below is simpler than my actual use case and is given for examples sake.

def get_z(dictionary):
    """ This function returns a scalar value. """
    return np.sum(dictionary['elapsed'] / dictionary['values'])

I would like to see how this function output changes as the time-interval changes. So, I've created a function that takes a dictionary as input and outputs a new dictionary, the array values of which are sliced at the input indices for each of the keys in the input dictionary. Note that the consecutive elapsed hours can serve as indices.

def subsect(dictionary, indices):
    """ This function returns a dictionary, the array values
        of which are sliced at the input indices. """
    return {key : dictionary[key][indices] for key in list(dictionary.keys())}

To verify that the above functions work, one can run the for-loop containing the function read_dictionary(...) below.

def read_dictionary(dictionary):
    """ This function prints the input dictionary as a check. """
    for key in list(dictionary.keys()):
        print(" .. KEY = {}\n{}\n".format(key, dictionary[key]))

print("\nORIGINAL DATA DICTIONARY\n")
read_dictionary(data_dictionary)

# for i in range(1, 38):
    # mod_dictionary = subsect(data_dictionary, indices=slice(i, 39, 1))
    # print("\n{}th MODIFIED DATA DICTIONARY\n".format(i))
    # read_dictionary(mod_dictionary)

My issue is that I would like a contour plot. The x-axis will contain the lower bound of the datetime interval (the first entry of mod_dictionary[i]) while the y-axis will contain the upper bound of the datetime interval (the last entry of mod_dictioary[i]). Normally when making a contour plot, one has an array of (x,y) values that are made into a grid (X,Y) via numpy.meshgrid. As my actual function (not the one in the example) is not vectorized, I can use X.copy().reshape(-1) and reshape my result back using (...).reshape(X.shape).

My exact problem is that I do not know how I can make a grid of different parameters using a single dictionary as an input for a function that outputs a single scalar value. Is there a way to do this?

2 Answers2

1

If I understood your idea correctly then this should be what you need. However I needed the following packages:

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
import pandas as pd

First the required values are stored in three lists. I had to change the for loop a little because in your example all upper bounds where the same, so no contour plot was possible:

lower_bounds = [];
upper_bounds = [];
z_values = [];
for j in range(1, 30):
  for i in range(0,j):
    mod_dictionary = subsect(data_dictionary, indices=slice(i, j, 1))
    lower_bounds.append(mod_dictionary['datetime'][0])
    upper_bounds.append(mod_dictionary['datetime'][-1])
    z_values.append(get_z(mod_dictionary))

Then the datetime strings are converted to Timestamps:

lower_bounds_dt = [pd.Timestamp(date).value for date in lower_bounds]
upper_bounds_dt = [pd.Timestamp(date).value for date in upper_bounds]

And the grid for the contour plot is generated:

xi = np.linspace(min(lower_bounds_dt), max(lower_bounds_dt), 100)
print(xi)
yi = np.linspace(min(upper_bounds_dt), max(upper_bounds_dt), 100)
print(yi)

Using griddata the missing grid points for the z values are generated.

zi = griddata(lower_bounds_dt, upper_bounds_dt, z_values, xi, yi)
print(zi)

Finally you can use contour or contourf to generate the contour plot:

fig1 = plt.figure(figsize=(10, 8))
ax1 = fig1.add_subplot(111)
ax1.contourf(xi, yi, zi)
fig1.savefig('graph.png')

As currently the generated data is only a small band (because the lower and upper bound in the for loop increase together) the result looks like this:

Result of contourf

You could easily change this by changing the way you span your data arrays in the for loop. Using pd.to_datetime you could also display the x and y axis in your preferred datetime format.

Edit: I uploaded the complete example to repl.it

Axel
  • 1,415
  • 1
  • 16
  • 40
  • I will play around this in a little bit, thanks for complete example. –  May 07 '18 at 22:24
  • When you call `mod_dictionary = subsect(data_dictionary, indices=slice(i, i+18, 1))` in the for-loop, notice that the time-interval stays constant. If one could do all possible combinations of index-slices, then one could generate a contour plot over the top-left triangle of the grid (as the bottom-right triangle of the grid would be composed of points at which the initial datetime would be later than the final datetime). Doing so would also remove the need to [interpolate values](https://stackoverflow.com/questions/5615978/program-hangs-when-using-matplotlib-mlab-griddata/5623980#5623980). –  May 08 '18 at 09:20
  • I was trying to nest a for loop in a while loop that iteratively adjusted bounds (unsuccessfully), you’re solution is much more elegant! Thank you. –  May 08 '18 at 10:52
1

Using the solution posted by @Axel, I was able to make the contour plot without using griddata and pandas. (I need to edit the ticklabels, but that is not my concern here. The elapsed hours from the original dictionary can be used as indices to slice the array of datetimes for this purpose). The advantage of this approach is that interpolation is not required, and the use of numpy vectorization beats the speed obtained using a double for-loop.

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

def initialize_xy_grid(data_dictionary):
    """ """
    params = {'x' : {}, 'y' : {}}
    params['x']['datetime'] = data_dictionary['datetime'][:-1]
    params['x']['elapsed'] = data_dictionary['elapsed'][:-1]
    params['y']['datetime'] = data_dictionary['datetime'][1:]
    params['y']['elapsed'] = data_dictionary['elapsed'][1:]
    X_dt, Y_dt = np.meshgrid(params['x']['datetime'], params['y']['datetime'])
    X_hr, Y_hr = np.meshgrid(params['x']['elapsed'], params['y']['elapsed'])
    return X_hr, Y_hr, X_dt, Y_dt

def initialize_z(data_dictionary, X, Y):
    """ """
    xx = X.copy().reshape(-1)
    yy = Y.copy().reshape(-1)
    return np.array([get_z(subsect(data_dictionary, indices=slice(xi, yi, 1))) for xi, yi in zip(xx, yy)])

def initialize_Z(z, shape):
    """ """
    return z.reshape(shape)

X_hr, Y_hr, X_dt, Y_dt = initialize_xy_grid(data_dictionary)
z = initialize_z(data_dictionary, X_hr, Y_hr)
Z = initialize_Z(z, X_hr.shape)

ncontours = 11
plt.contourf(X_hr, Y_hr, Z, ncontours, cmap='plasma', )
contours = plt.contour(X_hr, Y_hr, Z, ncontours, colors='k')
fmt_func = lambda x, pos : "{:1.3f}".format(x)
fmt = matplotlib.ticker.FuncFormatter(fmt_func)
plt.clabel(contours, inline=True, fontsize=8, fmt=fmt)
plt.show()