1

Does _FillValue or missing_value still occupy storage space?

If there is a 2-dimensional array with some null values, How can I write it to netcdf file for saving storage space?

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Li Ziming
  • 385
  • 2
  • 5
  • 17

3 Answers3

3

In netCDF3 every value requires the same amount of disk space. In netCDF4 it is possible to reduce the required disk space using gzip compression. The actual compression ratio depends on the data. If there are lots of identical values (for example missing data), you can achieve good results. Here is an example in python:

import netCDF4
import numpy as np
import os

# Define sample data with all elements masked out
N = 1000
data = np.ma.masked_all((N, N))

# Write data to netCDF file using different data formats
for fmt in ('NETCDF3_CLASSIC', 'NETCDF4'):
    fname = 'test.nc'
    ds = netCDF4.Dataset(fname, format=fmt, mode='w')
    xdim = ds.createDimension(dimname='x', size=N)
    ydim = ds.createDimension(dimname='y', size=N)
    var = ds.createVariable(
        varname='data',
        dimensions=(ydim.name, xdim.name),
        fill_value=-999,
        datatype='f4',
        complevel=9,  # set gzip compression level
        zlib=True  # enable compression
    )
    var[:] = data
    ds.close()

    # Determine file size
    print fmt, os.stat(fname).st_size

See the netCDF4-python documentation, section 9) "Efficient compression of netCDF variables" for details.

sfinkens
  • 1,210
  • 12
  • 15
2

Just to add to the excellent answer from Funkensieper, you can copy and compress files from the command line using cdo:

 cdo -f nc4c -z zip_9 copy in.nc out.nc

One could compress files simply using gzip or zip etc, but the disadvantage is that you need to decompress before reading. Using the netcdf4 compression capabilities avoids this.

You can select your level X of compression by using -z zip_X. If your files are very large you may want to sacrifice a little bit the file size in return for faster access times (e.g. using zip_5 or 6, instead of 9). In many cases with heterogeneous data, the compression gain is small relative to the uncompressed file.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
2

or similarly with NCO

ncks -7 -L 9 in.nc out.nc

Charlie Zender
  • 5,929
  • 14
  • 19