I want to be able to compress arrays stored inside a netcdf file via manipulation of the scale factor and add offsets to be applied to arrays via conversion of array data types (i.e float32 to int16)
I want to make raster data which would normally be too big to work with in python into smaller more manageable rasters. I know that it is possible to apply scale factors and offsets with respect to netcdf data as to not only make file sizes smaller but would this same logic be applicable when the arrays are loaded in for easier memory management. I have another method to make large arrays manageable with a different numpy already but I would like to achieve this with netcdfs.
I already have the following code which is based off several links http://james.hiebert.name/blog/work/2015/04/18/NetCDF-Scale-Factors.html
The test file I am using is one I've generated for myself is a netcdf file which houses a float32 numpy array and is converted from a geotiff file to the netcdf via way of gdal translate
import netCDF4
from math import floor
import numpy as np
def compute_scale_and_offset(min, max, n):
# stretch/compress data to the available packed range
scale_factor = (max - min) / (2 ** n - 1)
# translate the range to be symmetric about zero
add_offset = min + 2 ** (n - 1) * scale_factor
return scale_factor, add_offset
def pack_value(unpacked_value, scale_factor, add_offset):
return unpacked_value - add_offset / scale_factor
def unpack_value(packed_value, scale_factor, add_offset):
return packed_value * scale_factor + add_offset
netcdf_path = r"path/to/netcdf"
nc = netCDF4.Dataset(netcdf_path,"a")
data = nc.variables['Band1'][:]
scale_factor,offset = compute_scale_and_offset(np.min(data),np.max(data),16)
data = pack_value(data,scale_factor,offset)
data_b = data.astype(np.int16,copy=False)
nc.variables['Band1'][:] = data_b
nc.close()
Right now the file I am working with does not change in size when I run the above code but the core data array does change in terms of what values it outputs. My expected result would be an alteration to the above code which would work any generic netcdf file to convert the data array and allow for the offsets to be applied and stored in the file so that they are loaded in upon read in from netcdf4.