4

I have some complex data (numpy dtype complex128) in an xarray data set which I want to save with to_netcdf. I get the following error:

TypeError: illegal primitive data type, must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got complex128

I do understand that I am passing a data type to the underlying netCDF4 which is not supported. I also found https://unidata.github.io/netcdf4-python/ on compound data types with netcdf4. But unfortunately I don't see how I can apply that to my problem as I am not directly working with the netcdf4 library.

Can I save data of the data type complex128 to netcdf while keeping the data type (using xarray.DataArray.to_netcdf)?

MWE:

import numpy as np
import xarray as xr
complex = [np.complex(1.0, 1.0), np.complex(2.0, 1.0), np.complex(3.0, 1.0), np.complex(4.0, 1.0)]
data = xr.DataArray(complex)
data.to_netcdf(r'test.nc')
Anna Krause
  • 43
  • 1
  • 4

2 Answers2

4

NetCDF as a file-format does not support complex data. Apparently geoscience users did not have a strong need for saving complex values.

That said, you could indeed write complex128 data to a netCDF file using some sort of ad-hoc convention, such as with a custom compound data type. This is similar to the approach used by h5py. This would indeed need to implemented in xarray itself: a pull request would be welcome.

With the current version of xarray, you have two options for serializing complex values:

  1. Use engine='h5netcdf'. This uses h5py's convention to write complex data. Unfortunately, this results in an invalid netCDF file which is unreadable by netCDF-C. You should see a warning message indicating this if you try it. In a future version of xarray, we will probably require using a dedicated method such as to_hdf5() rather than to_netcdf() for creating such invalid files.

  2. Convert the data into real and imaginary parts, and save them as separate variables. Combine them back into complex values when reading the data back from disk. Pick whatever ad-hoc convention seems best to you.

e.g.,

def save_complex(data_array, *args, **kwargs):
    ds = xarray.Dataset({'real': data_array.real, 'imag': data_array.imag})
    return ds.to_netcdf(*args, **kwargs)

def read_complex(*args, **kwargs):
    ds = xarray.open_dataset(*args, **kwargs)
    return ds['real'] + ds['imag'] * 1j
shoyer
  • 9,165
  • 1
  • 37
  • 55
2

Beside the two perfectly good options provided by shoyer, there is another solution that I have found to be useful in practise:

Add another dimension of length 2 to the dataset, which represents the real and imaginary part of the data. This is similar to storing the real and imaginary part in separate variables, but in some cases can be nicer to work with, in my experience.

For example, storing, say, a float variable with dimensions (x, ReIm), where ReIm is the real-imaginary and x an arbitrary dimension, gives a memory layout equivalent to an 1d array along dimension x of float _Complex in C or, equivalently, std::complex<float> in C++.

Reading and writing works like this:

def save_complex(dataset, *args, **kwargs):
    ds = dataset.expand_dims('ReIm', axis=-1) # Add ReIm axis at the end
    ds = xarray.concat([ds.real, ds.imag], dim='ReIm')
    return ds.to_netcdf(*args, **kwargs)

def read_complex(*args, **kwargs):
    ds = xarray.open_dataset(*args, **kwargs)
    return ds.isel(ReIm=0) + 1j * ds.isel(ReIm=1)

As the example shows, this approach is easy to implement for datasets (not just data arrays).

arccos
  • 175
  • 1
  • 1
  • 8
  • although I prefer your approach of an additional dimensions for real/complex than adding a second variable like in the answer of @shoyer, your solution seems to need much more memory (?). While shoyer's solutions works well with a test data set (~2GB) I run out of memory with your solution. Thanks though for the idea! :) – nicrie Nov 08 '20 at 14:18