2

I'm trying to fill nan values in a NetCDF file (let's call is 'Target' file) by getting the values from another NetCDf file ('Source' file). [the two example files can be downloaded from here] I was thinking of doing this in python using the following framework:

Step1- identifying the nan values in the Target file, and extracting the location (lat/long), storing in a dataframe

Step2- Extracting the corresponding values of the stored lat/long from the Source file

Step3- writing these values into the Target file

I came up with the following code:

import pandas as pd
import xarray as xr
import numpy as np

Source = xr.open_dataset("Source.nc")
Target = xr.open_dataset("Target.nc")

#Step 1 
df = Target.to_dataframe()
df=df.reset_index()
df2=(df.loc[df['ET'].isin([32767,'nan'])])


#Step2
lat = df2["lat"]
lon = df2["lon"]
point_list = zip(lat,lon)

Newdf = pd.DataFrame([])

for i, j in point_list:
    dsloc = Source.sel(lat=i,lon=j,method='nearest')
    DT=dsloc.to_dataframe()
    Newdf=Newdf.append(DT,sort=True)

there are three issues with that: 1- I don’t know how to do step three

2- The second step take forever to complete as perhaps there are many missing points

3- This is just for one time step! Using the two files.

So, I believe there might be better ways, easier and faster to do this in python or cdo/Nco… Any ideas and solutions are welcomed…thank you… Note that, the two NC files are in different spatial resolution (dimensions).

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Seji
  • 371
  • 1
  • 10

1 Answers1

2

You can use Xarray's where method for this. You really want to stay away from a python for loop if you are concerned with efficiency at all. Here's an example of how this would work:

# these are the points you want to keep
# you can fine tune this further (exclude values over a threshold)
condition = target.notnull()

# fill the values where condition is false
target_filled = target.where(condition, source)
jhamman
  • 5,867
  • 19
  • 39
  • Thank you... but it returns an error "indexes along dimension 'lat' are not equal" ...perhaps because the two ncfiles have different dimensions – Seji Nov 19 '19 at 00:54
  • You'll need to "regrid" or "reindex" source before calling `where()` then. – jhamman Nov 19 '19 at 04:05
  • I tried aligning them using `a, b = xr.align(source, target, join='right' ,copy=True)` but it returns nan values for my bigger domain ! – Seji Nov 19 '19 at 04:22
  • You may want to check out xarray's documentation on interpolating data: http://xarray.pydata.org/en/stable/interpolation.html. – jhamman Nov 22 '19 at 17:34