0

Ultimately, I want to calculate the difference between modelled and measured air quality. I have two netcdf files. The first one (A) contains air particle data from a model with latitude (y1) index-length 100 and longitude (x1) index-length 200. From this A, I want to subtract observation data (B) with latitude (y2) index-length 1300 and longitude (x2) index-length 1300. The actual latitude values of B (in degrees North and East) are present in A, although not exactly, i.e. values in A are evenly spaced (e.g. 55.95°, 55.85°, 55.75°, etc.) but the values in B have 3 decimals and are spaced by changing increments of roughly 0.001 to 0.003.

It feels like this should be straight forward: take obs data in a lat/lon range (e.g. 50.5 to 51°N and 8.1 to 8.2°E) and subtract it from model data in the same lat/lon range.

At first I tried with numpy adapting from this example of calculating 'departure from global temperature'. But I keep running into dead ends.

Then, I tried a gazillion variations of something along the lines of this (which is obviously wrong, but I am no coding wizzard):

anomaly=[]
for j in range(len(100)):
    for k in range(len(200)):
        for i in range(len(1300)):
            if latitude_model[j] == latitude_observation[i] and longitude_model[k] == longitude_observation[i]:
                departure = model_data[0,0,j,k] - observation_data[i,i] #the first two dimensions of the model data are 'time' and 'level'
                anomaly = np.append(departure)

My third approach was with xarray adapting from this example. Xarray would allow to use method='nearest' and tolerance = 0.1 functions which would help with the not-matching lat/lon data (as far as I understand). But after loading the two netcdf files I can't even find an entrance point to how to continue the code. Plus I would probably have to reshape (but how?) the model data to match the observations. Or subtract observation data from the same model grid, if several observation points fall within the same grid.

PS: This question is eventually related to my other question, which is about the same data and problem.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
pwi
  • 51
  • 7
  • What is the format of your model and observation data? Can you put them into corresponding numpy arrays? if so, do that, and comparison is easy. Also, you say you "kept running into dead ends" – it would be helpful to know what those dead ends, and any corresponding error messages, are – David Jun 10 '20 at 16:44
  • @David I am not entirely sure what you mean by format. I have described the data in [a related question](https://stackoverflow.com/questions/62253106/calculate-departure-or-anomaly-of-a-value-between-two-arrays-of-different-geogra), perhaps that helps? `xarray` has a [to array](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_array.html) function. I think that could do what you suggest with 'put them into corresponding numpy arrays' ? – pwi Jun 10 '20 at 17:08

2 Answers2

3

When calculating for the bias between the model and observed, it is important that you match the grids before conducting your analysis. In other words, pre-processing is always a must. So you'll either have to match the grid of the model to the observed or vice-versa before subtracting both files, else, your output won't make sense at all because of the difference. The easiest way to do this is to use special operators like CDO, NCO, NCL, etc.

In your command line (though CDO is available in Python as well but with a different syntax needed than below)

### Match the grids ###
cdo remapbil,obs.nc model.nc model1.nc

### Subtract the files ###
cdo sub model1.nc obs.nc bias.nc

You can then easily map the difference in Python. I prefer this method just because it's easier and lighter than pre-processing the data in Python.

(sent from smartphone)

Dani56
  • 372
  • 1
  • 2
  • 7
  • This works, thank you! I do get weird results but at least this is technically working – pwi Jun 19 '20 at 21:25
  • @pwi That's great news! You might be interested to explore more about said operators to further analyze your data with ease. – Dani56 Jun 20 '20 at 18:34
  • Thanks. I will do that. Also, I finally got exactly what I wanted! Will write it up and post here in the coming days. – pwi Jun 20 '20 at 19:06
2

If you want to do this using Python (with CDO as a backend, which needs to be installed) you can use my package nctoolkit (https://nctoolkit.readthedocs.io/en/latest/installing.html).

So, if your two files are named file1 and file2. You would first read them in as datasets.

import nctoolkit as nc

data1 = nc.open_data(file1)

data2 = nc.open_data(file2)

You can then regrid the first dataset to have the same grid as the first. This is necessary so the cells match up.

data1.regrid(data2)

You can just subtract the second dataset from the first.

data1.sub(data2)

If you want to then convert this to an xarray object you can just do this:

d1_xr = data1.to_xarray()

or if you wanted a pandas dataframe do this:

d1_df = data1.to_dataframe()

There is also an autplotting method, using holoviews:

df1_df.plot()

Community
  • 1
  • 1
Robert Wilson
  • 3,192
  • 11
  • 19
  • Thank you. That seems to be a nice package. Unfortunately, I get the following error during the subtraction step ```AttributeError: 'DataSet' object has no attribute 'sub' ``` I wasn't able to find a solution to the error. I suspect I have some dependency problems in jupyter/python. With Dani56's solution it worked, using the same .nc files. – pwi Jun 19 '20 at 21:26
  • That is odd. What OS/Python version are you using? – Robert Wilson Jun 20 '20 at 06:35
  • macOS 10.15.5; python 3.7.7; jupyter notebook 6.0.3. However, ```conda install -c conda-forge``` did not work (somethings wrong with my conda and I can't fix it) and I did it through pip. While pip didn't work for your package and I had to use brew. Like I said, it is more likely my system causing the problem than your packages. I wouldn't bother looking into it. – pwi Jun 20 '20 at 07:04
  • Thanks. New package, so I'm sure there are still lots of issues to iron out – Robert Wilson Jun 20 '20 at 07:28