1

I am using daily data to calculate monthly averages using ensemble_mean. Once I have the file with the monthly average, I regrid the file from 0.1 to 0.25 degrees using another file as the target grid. The ensemble mean goes well, but when trying to regrid the file I get the following error:

ValueError: CDO error: Error (cdf_put_vara_double): NetCDF: Numeric conversion not representable. Tip: check if missing values are incorrectly set to large actual values!

This happens only in certain months. For some others, the regridding process works perfectly.

The code I am using is:

import nctoolkit as nc

ds = nc.open_data("/home/omi_data/HCHO/data/2006/12/*.nc4")
ds1=nc.open_data("/home/omi_data/NO2/data/2006/07/OMI-Aura_L3-OMNO2d_2006m0702_v003-2019m1121t032327.he5.ncml.nc4")
ds.ensemble_mean('key_science_data_column_amount')
ds.regrid(ds1)
ds.to_nc('/home/omi_data/HCHO/data/2006/monthly_average/HCHO_0612.nc4')

Data link

jairovs
  • 13
  • 3
  • Can you add the code you used in the question? The problem is likely caused by how the variable formats, which probably need to be modified – Robert Wilson Feb 09 '23 at 07:21
  • Thank you so much Robert! I have added the code I am using. – jairovs Feb 09 '23 at 07:30
  • Have you checked the ensemble mean is calculated? In your code it will only be run when regrid is called because of lazy evaluation. My guess is that it’s the numeric type that is causing ensemble mean to fail. Try adding ‘ds.run()’ after ensemble mean – Robert Wilson Feb 09 '23 at 07:42
  • when using cdo from the command line this error is due to a precision change, and is resolved by converting shorts to floats using the option "-b f32" - Robert is it possible with your package to pass such options? – ClimateUnboxed Feb 09 '23 at 10:37
  • That was my guess based on the question, @AdrianTompkins. You can change numerical precision using the `set_precision` method. In this case `ds.set_precision("F32")` should work. Under the hood, this would be the same as your answer. – Robert Wilson Feb 09 '23 at 13:34
  • Also, the error here is not as helpful as it could be. I've just modified the dev version so that it suggests changing the numerical precision – Robert Wilson Feb 09 '23 at 13:41
  • I very much appreciate all your answers. I tried adding "ds.run()" after the ensemble mean, but It did not help to solve the issue. I will try all the other suggestions and will update the post in the coming days. – jairovs Feb 10 '23 at 06:53
  • did my soln work? – ClimateUnboxed Mar 01 '23 at 07:56
  • I tried changing the precision of the dataset using "ds.set_precision("F32")" and did not work. Robert suggested I share some of the files, but I did not manage to do it using Pastebin. Any suggestion on how I could share some of the files? Thank you! – jairovs Mar 07 '23 at 07:17
  • You need to @ people for them to be notified. Only just seeing this now. You can try sharing via dropbox etc. if public doesn't work. My contact details here: https://nctoolkit.readthedocs.io/en/latest/info.html – Robert Wilson Mar 09 '23 at 12:56
  • Thank you @RobertWilson. I have added a link to the original question with four files. The regrinding did not work for the files in February, and it did work for the December ones. If not possible to access the files, I will share them through your contact details. I appreciate your help. – jairovs Mar 14 '23 at 06:46
  • OK @jairovs. I'm not seeing any issue with those data files when I do an ensemble mean. Was ensemble_mean or regrid causing the error? If the former then maybe this is a CDO versioning issue. You haven't provided the file for ds1, so I can't reproduce the regridding issue – Robert Wilson Mar 14 '23 at 08:04
  • Thank you @RobertWilson. You are right, I realized that regridding the assemble mean of only two files does not give any issue. The problem appears when regridding the assemble mean of the 28 files for the whole month. After that, I tried first regridding the 28 files and then calculating the assemble mean of the already regridded files, and it worked. So, for those files, it is better to first regrid and then assemble mean. Do you know why? I added the files for the whole month and the ds1 file, which is the target grid file. I very much appreciate your support. – jairovs Mar 16 '23 at 06:29

2 Answers2

1

When using cdo from the command line this error is due to a precision change, and is resolved by converting shorts to floats using the option "-b f32".

This is a command-line based quick fix, but I'm sure Robert can come up with a better fix within his package itself.

for file in /home/omi_data/HCHO/data/2006/12/*.nc4 ; do cdo -b f32 -f nc4 copy $file ${file%????}_flt.nc4 ; done 

This converts all the files to files with floats, and then in your python code you need to refer to

/home/omi_data/HCHO/data/2006/12/*_flt.nc4 

to ensure you only pick up the converted files. As I said, a clunky quick fix. EDIT: I'm pasting in Robert's comment from above, so this credit is his, you can change precision in his package by using

ds.set_precision("F32")

You can also do this from within python itself using the cdo package, I think this is the equivalent (untested, I hope the wildcard works like this.)

from cdo import *
cdo = Cdo()
ifile="/home/omi_data/HCHO/data/2006/12/*.nc4"
gridfile="/home/omi_data/NO2/data/2006/07/OMI-Aura_L3-OMNO2d_2006m0702_v003-2019m1121t032327.he5.ncml.nc4"

cdo.ensmean(input = ifile, output = "ensmean.nc", options = '-b f32')
cdo.remapbil(gridfile,input="ensmean.nc", output="ensmean_regrid.nc")
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
0

This problem appears to be caused by issues in the raw data. The netCDF files say the data format is F32. However, one of the files actually has data values that are outside the maximum range accepted by F32. That's a mistake during file creation. This is causing problems in CDO when nctoolkit calls it. As the error said, you have data can cannot be represented with 32-bit. Essentially, what you will have to do is correct the raw data before processing it. Just set anything outside the valid range to NA. The following should work:

ds = nc.open_data("/home/omi_data/HCHO/data/2006/12/*.nc4")
ds.as_missing([3.40282347E+38, 1e50])
ds.as_missing([-1e50, -3.40282347E+38])
ds1=nc.open_data("/home/omi_data/NO2/data/2006/07/OMI-Aura_L3- 
OMNO2d_2006m0702_v003-2019m1121t032327.he5.ncml.nc4")
ds.ensemble_mean('key_science_data_column_amount')
ds.regrid(ds1)
ds.to_nc('/home/omi_data/HCHO/data/2006/monthly_average/HCHO_0612.nc4')

Alternatively, you could also change the units to something more sensible. You really don't want to work with files with such large values, as you will easily run into limits with whatever numerical precision you are working with. However, the units are slightly confusing. The file says "molecules/cm^2". However, you can get positive and negative values. I don't understand that, so I can't provide guidance on changing the units.

Robert Wilson
  • 3,192
  • 11
  • 19
  • Thanks. Can you upvote or accept the answer? So the negatives should be Nan? You can fix that with as_missing – Robert Wilson Mar 20 '23 at 07:07
  • 1
    Thank you @RobertWilson this worked perfectly. As you suggested, I could do the regridding after correcting the data using ds.as_missing. The negative values are data below the baseline. I filtered the data using pandas after the regridding process, masking all the negatives and values above 10e17. I wanted to filter the files before the assemble mean and regridding using nctoolkit, but I did not find an option to do it, is it possible? An option for working with lower values is dividing the data by 10e15. I also did it with Pandas. Thank you so much for Nctoolkit, has been a great help! – jairovs Mar 20 '23 at 07:15
  • Hello @Robert, sorry for bothering you again. I have a couple of questions I would like to ask. I need to set as_missing all the negatives and all values above 1e17, but at the same time, I want to fill all the empty grids using fill_na. What would be the correct order to do it? Is it better to fill the na’s first and then do the as_missing or the other way? And what would be a good number to use in the fill_na()? I wonder what happens if several contiguous grids are empty when using fill_na. I very much appreciate your guidance! – jairovs Apr 14 '23 at 07:44
  • If you can please ask this here, as I cannot answer this in the comments due to lack of space https://github.com/pmlmodelling/nctoolkit/discussions – Robert Wilson Apr 14 '23 at 07:59
  • Thank you Robert. I have asked the question in the discussions forum – jairovs Apr 17 '23 at 00:07