1

I have ERA5 files that I am trying to concatenate into monthly files. It appears the files have been packed to reduce size making the data type within the file a short. When I try ncrcat, it will warn about encountering a packing attribute "add_offset", then concatenate all the files together. However the values of the data become messed up. I tried using ncpdq -U to unpack the files, then ncrcat to concatenate which works. But the resulting files are too large to be useful and when I try ncpdq to repack the resulting file I receive a malloc() failure which seems related to a memory/RAM issue.

I've also tried cdo merge which strangely works perfectly for most of the concatenations, but a few of the files fail and output this error "Error (cdf_put_vara_double): NetCDF: Numeric conversion not representable"

So is there anyway to concatenate these files while they are still packed, or at least a way to repack the large files once they are concatenated

  • 1
    Can you possibly add links to the files that fail? – Robert Wilson Jul 17 '20 at 09:36
  • 1
    but in the meantime I would suggest adding `-f 32` to whatever your cdo command is – Robert Wilson Jul 17 '20 at 09:52
  • Sure @Robert, I've uploaded a couple files here. When I perform ncrcat on these two files, the resulting file will have different values when switching to the next year https://drive.google.com/drive/folders/1EwDuHuPP_MwdSFV_0l772u6lUnhnFZ32?usp=sharing – Jonathan Wille Jul 17 '20 at 10:17
  • I quickly tried that `cdo mergetime` worked OK with those two files – Robert Wilson Jul 17 '20 at 10:36
  • True, but I found two files where cdo mergetime does not work. I uploaded them here. https://drive.google.com/drive/folders/1EwDuHuPP_MwdSFV_0l772u6lUnhnFZ32?usp=sharing – Jonathan Wille Jul 17 '20 at 12:33
  • Okay, now I see the issue. Every file in my directory has a different "add_offset" and "scale_factor". So it is impossible to concatenate these files while they are packed. So I can only use the unpacked files for concatenating. Of course my problem is that I cannot repack the files when I'm done since they are so large. So my only solution is to find a machine with more RAM since each unpacked file is about 50 Gbs – Jonathan Wille Jul 17 '20 at 12:45
  • Thanks. I'll look into this. I'm developing a python package that uses CDO as a backend (https://nctoolkit.readthedocs.io/en/latest/), so I probably need to add a solution to this kind of problem to it – Robert Wilson Jul 17 '20 at 13:14
  • True, originally I was using cdo -b F64 mergetime, but I found cdo -b F32 mergetime works as well and gives me files half as large. Your package looks like very helpful companion to the python netcdf package. Good luck! – Jonathan Wille Jul 17 '20 at 13:30
  • Thanks. It does the abilty to change the precision, but I might just leave it at that – Robert Wilson Jul 17 '20 at 14:42
  • @RobertWilson Clearly if you are concatenating files with very different values of `add_offset`, then preserving the input data length may lead to enough loss of precision to care about, but equally, if the values are reasonably similar then it may be fine. With this in mind, if you are implementing this feature would be good to offer the user a choice of output precision. (Or maybe this is what you were already referring to?) – alani Jul 18 '20 at 21:28
  • The package let's users change output precision (`nc.options(precision = whatever)`), but what I possibly need is a way of checking ensembles for the offset issue mentioned here. Though it's possible it's not common enough to make it worth implementing. – Robert Wilson Jul 19 '20 at 09:55

2 Answers2

2

Instead of repacking the large files once they are concatenated you could try netCDF4 compression, e.g.,

ncpdq -U -7 -L 1 inN.nc in_upk_cmpN.nc # Loop over N
ncrcat in_upk_cmp*.nc out.nc

Good luck!

Charlie Zender
  • 5,929
  • 14
  • 19
1

When data is packed, CDO will often throw an error due to too much loss of precision,

cdo -b32 mergetime in*.nc out.nc 

should do the trick and avoid the error. If you want to then compress the files you can try this:

cdo -z zip_9 copy out.nc out_compressed.nc 
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86