I tried several methods to make a xarray (xr) dataset out of multiple .h5 files. The files contain data from SMAP project on soil moisture content along with other useful variables. Each variable represent a 2-D Array. The count of variables and their label are in every file equal. The problem is the dimensions size of dimension x and y are not equal.
Example dataset load via xr.open_dataset()
<xarray.Dataset>
Dimensions: (x: 54, y: 129)
Coordinates:
EASE_column_index_3km (x, y) float32 ...
EASE_column_index_apm_3km (x, y) float32 ...
EASE_row_index_3km (x, y) float32 ...
EASE_row_index_apm_3km (x, y) float32 ...
latitude_3km (x, y) float32 ...
latitude_apm_3km (x, y) float32 ...
longitude_3km (x, y) float32 ...
longitude_apm_3km (x, y) float32 ...
Dimensions without coordinates: x, y
Data variables:
SMAP_Sentinel_overpass_timediff_hr_3km (x, y) timedelta64[ns] ...
SMAP_Sentinel_overpass_timediff_hr_apm_3km (x, y) timedelta64[ns] ...
albedo_3km (x, y) float32 ...
albedo_apm_3km (x, y) float32 ...
bare_soil_roughness_retrieved_3km (x, y) float32 ...
bare_soil_roughness_retrieved_apm_3km (x, y) float32 ...
beta_tbv_vv_3km (x, y) float32 ...
beta_tbv_vv_apm_3km (x, y) float32 ...
disagg_soil_moisture_3km (x, y) float32 ...
disagg_soil_moisture_apm_3km (x, y) float32 ...
disaggregated_tb_v_qual_flag_3km (x, y) float32 ...
disaggregated_tb_v_qual_flag_apm_3km (x, y) float32 ...
gamma_vv_xpol_3km (x, y) float32 ...
gamma_vv_xpol_apm_3km (x, y) float32 ...
landcover_class_3km (x, y) float32 ...
landcover_class_apm_3km (x, y) float32 ...
retrieval_qual_flag_3km (x, y) float32 ...
retrieval_qual_flag_apm_3km (x, y) float32 ...
sigma0_incidence_angle_3km (x, y) float32 ...
sigma0_incidence_angle_apm_3km (x, y) float32 ...
sigma0_vh_aggregated_3km (x, y) float32 ...
sigma0_vh_aggregated_apm_3km (x, y) float32 ...
sigma0_vv_aggregated_3km (x, y) float32 ...
sigma0_vv_aggregated_apm_3km (x, y) float32 ...
soil_moisture_3km (x, y) float32 ...
soil_moisture_apm_3km (x, y) float32 ...
soil_moisture_std_dev_3km (x, y) float32 ...
soil_moisture_std_dev_apm_3km (x, y) float32 ...
spacecraft_overpass_time_seconds_3km (x, y) timedelta64[ns] ...
spacecraft_overpass_time_seconds_apm_3km (x, y) timedelta64[ns] ...
surface_flag_3km (x, y) float32 ...
surface_flag_apm_3km (x, y) float32 ...
surface_temperature_3km (x, y) float32 ...
surface_temperature_apm_3km (x, y) float32 ...
tb_v_disaggregated_3km (x, y) float32 ...
tb_v_disaggregated_apm_3km (x, y) float32 ...
tb_v_disaggregated_std_3km (x, y) float32 ...
tb_v_disaggregated_std_apm_3km (x, y) float32 ...
vegetation_opacity_3km (x, y) float32 ...
vegetation_opacity_apm_3km (x, y) float32 ...
vegetation_water_content_3km (x, y) float32 ...
vegetation_water_content_apm_3km (x, y) float32 ...
water_body_fraction_3km (x, y) float32 ...
water_body_fraction_apm_3km (x, y) float32 ...
Example variable dataset.soil_moisture_3km
<xarray.DataArray 'soil_moisture_3km' (x: 54, y: 129)>
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
EASE_column_index_3km (x, y) float32 ...
EASE_column_index_apm_3km (x, y) float32 ...
EASE_row_index_3km (x, y) float32 ...
EASE_row_index_apm_3km (x, y) float32 ...
latitude_3km (x, y) float32 ...
latitude_apm_3km (x, y) float32 ...
longitude_3km (x, y) float32 ...
longitude_apm_3km (x, y) float32 ...
Dimensions without coordinates: x, y
Attributes:
units: cm**3/cm**3
valid_min: 0.0
long_name: Representative soil moisture measurement for the 3 km Earth...
coordinates: /Soil_Moisture_Retrieval_Data_3km/latitude_3km /Soil_Moistu...
valid_max: 0.75
First i tried to open the files with:
test = xr.open_mfdataset(list_of_paths)
this error occures:
ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {129, 132}
Then i try combine by coords
test = xr.open_mfdataset(list_of_paths, combine='by_coords')
produces this error:
ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation
try this:
test = xr.open_mfdataset(list_of_paths, coords=['latitude_3km', 'longitude_3km'], combine='by_coords')
end up with same error.
Then i try to open every file with xr.open_dataset() and try every method i can find on documentation page for combining data like merge, combine, broadcast_like, align & combine... but every time end up with the same problem that the dimensions are not equal. What is the common approach to reshape, align the dimensions or whatever is possible to solve this problem ?
UPDATE :
I found a workaround for my problem, but first I think I have forgotten to mention that the different files which I try to concatenate along the dimension time have different coordinates and dimensions. The images I try to build my model from all have overlapping areas with same longitude and latitude values but also parts with no overlapping.