11

I am trying to combine two spatial xarray datasets using combine_by_coords. These two datasets are two tiles next to each other. So there are overlapping coordinates. In the overlapping regions, the variable values of one of the datasets is nan.

I used the "combine_by_coords" with compat='no_conflicts' option. However, it returns the monotonic global indexes along dimension y error. It looks like it was an issue before but it was fixed (here). So I don't really know why I get this error. Here is an example (the netcdf tiles are here):

import xarray as xr

print(xr.__version__)
>>>0.15.1

ds1=xr.open_dataset('Tile1.nc')
ds2=xr.open_dataset('Tile2.nc')
ds = xr.combine_by_coords([ds1,ds2], compat='no_conflicts')
>>>...
 ValueError: Resulting object does not have monotonic global indexes along dimension y

Thanks

Ress
  • 667
  • 1
  • 7
  • 24
  • It seems like a bug to me. Inside `combine_by_coords`, indexes returned by [`_combine_nd`](https://github.com/pydata/xarray/blob/f3ca63a4ac5c091a92085b477a0d34c08df88aa6/xarray/core/combine.py#L735) are effectively non-monotonic for your y coord, and I can't see why. – paime Jul 09 '20 at 13:50
  • Thanks. Is there an alternative way (xarray or other packages) to do the same thing? I'll open an issue in xarray GitHub... – Ress Jul 09 '20 at 14:37
  • It works with `xr.merge([ds1, ds2])`, which makes the failure of `xr.combine_by_coords` even more suspiciours. Maybe you can open an issue. – paime Jul 09 '20 at 15:26
  • I just opened an issue on xarray. The merging distorts the data. It changes the values, displaces the pixels and also leaves strips of nan values on the image. – Ress Jul 09 '20 at 15:50

1 Answers1

14

This isn't a bug, it's throwing the error it should be throwing given your input. However I can see how the documentation doesn't make it very clear as to why this is happening!

combine_by_coords and combine_nested do two things: they concatenate (using xr.concat), and they merge (using xr.merge). merge groups variables of the same size, concat joins variables of different sizes onto the ends of one another. The concatenate step is never supposed to handle partially overlapping coordinates, and the combine functions therefore have the same restriction.

That error is an explicit rejection of the input you gave it: "you gave me overlapping coordinates, I don't know how to concatenate those, so I'll reject them." Normally this makes sense - when the overlapping coordinates aren't NaNs then it's ambiguous as to which values to choose.

In your case then you are asking it to perform a well-defined operation, and the discussion in the docs about merging overlapping coordinates here implies that compat='no_conflicts' would handle this situation. Unfortunately that's only for xr.merge, not xr.concat, and so it doesn't apply for combine_by_coords either. This is definitely confusing.

It might be possible to generalise the combine functions to handle the scenario you're describing (where the overlapping parts of the coordinates are specified entirely by the non-NaN values). Please open an issue proposing this feature if you would like to see it.

(Issue #3150 was about something else, an actual bug in the handling of "coordinate dimensions which do not vary between each dataset".)

Instead, what you need to do is trim off the overlap first. That shouldn't be hard - presumably you know (or can determine) how big your overlap is, and all your NaNs are on one dataset. You just need to use the .isel() method with a slice. Once you've got rid of the overlapping NaNs then you should be able to combine it fine (and you shouldn't need to specify compat either). If you're using combine_by_coords as part of opening many files with open_mfdataset then it might be easier to write a trimming function which you apply first using the preprocess argument to open_mfdataset.

ThomasNicholas
  • 1,273
  • 11
  • 21
  • 3
    Thank you for the explanation! Now it makes more sense. – Ress Jul 30 '20 at 17:02
  • Do you have an example of such a trimming function? In my case I have time data that overlaps (but both arrays are filled, I want to keep the last). – 3dSpatialUser Apr 18 '23 at 15:27
  • 1
    Your trimming function just accepts a dataset and returns a (trimmed) dataset. So you could literally use a lambda function like `preprocess=lambda ds: ds.isel(time=slice(0, -1))` for example – ThomasNicholas Apr 20 '23 at 16:04