Multidimensional data in Holoviews

Question

I have a 4-D dataset (as xr.DataArray) with dimensions temperature, datasource, time, and altitude.

How can I create a scatter plot with of temperature(src0, z) vs. temperature(src1, z), so that I can select the altitude via a slider?

I'm currently having the problem that when I convert the data to a hv.Table, I have among others one column datasource and one column temperature, and I cannot figure out how to plot temperature(datasource=='src0') vs. temperature(datasource=='src1')

EDIT:

I try to clarify: I have a 4-D dataset DATA (which is a xr.DataArray) with dimensions data_variable, datasource, time, and altitude.

data_variable has 2 entries, temperature and humidity.

datasource has 2 entries, model and measurement

There are 6 altitudes and ~2000 times.

How can I create a scatter plot which has

on the x-axis the data for the datasource model
on the y-axis the data for the datasource measurement

such that altitude and data_variable can be selected with a slider?

If you want altitude as a slider, how do you want it to behave, given that altitude is a real number, and so the slider value may or may not match any particular data value? Is the data gridded with respect to altitude, or do you want to bin by altitude into a finite set of altitude values? — James A. Bednar, Mar 20 '17 at 14:02
the altitude slider should select one of the 6 values of the altitude dimension at a time — andreas-h, Mar 20 '17 at 15:27

philippjfr · Accepted Answer · 2017-03-20T16:50:21.807

If I'm understanding your question correctly you want to plot scatter values for temperature over time comparing between the two datasources and indexed by different altitudes?

# Load the data into a holoviews Dataset
ds = hv.Dataset(data_array)

# Create Scatter objects plotting time vs. temperature
# and group by altitude and datasource
scatter = ds.to(hv.Scatter, 'time', 'temperature',
                groupby=['altitude', 'datasource'], dynamic=True)

# Now overlay the datasource dimension and display
scatter.overlay('datasource')

Hopefully I understood your question correctly but based on this basic pattern you should be able to plot the data in whatever arrangement you want.

Edit: Based on your edit the main problem is that HoloViews expects each data_variable to be in a separate array, in pandas terms you need to do the equivalent as pd.melt.

# Define data array like yours
dataarray = xr.DataArray(np.random.rand(10, 10, 2, 2), name='variable',
                   coords=[('time', range(10)), ('altitude', range(10)),
                           ('datasource', ['model', 'measurement']),
                           ('data_variable', ['humidity', 'temperature'])])

# Groupby datasource and data_variable, combining the resultant array into a Dataset with 4 data variables
group_dims = ['datasource', 'data_variable']
grouped = hv.Dataset(dataarray, datatype=['xarray']).groupby(group_dims)
dataset = xr.merge([da.data.rename({'variable': ' '.join(key)}).drop(group_dims)
                    for key, da in grouped.items()])

ds = hv.Dataset(dataset)
scatter = ds.to(hv.Scatter, 'model temperature', 'measurement temperature', 'altitude')

Note however that while testing this I ran into a bug, which I've now opened a PR for (see here)

No, that's not what I want. On the `x` axis there should be the temperature for `datasource[0]`, and on the y axis the temperature for `datasource[1]`. — andreas-h, Mar 20 '17 at 14:50

Multidimensional data in Holoviews

1 Answers1