2

I have a bucket on the Google cloud that contains multiple netcdf files. Normally, when the files are stored locally, I would perform:

import netCDF4

nc = netCDF4.Dataset('path/to/netcdf.nc')

Is it possible to do this in python straight from the google cloud without having to first download the file from the bucket?

JWB
  • 238
  • 1
  • 4
  • 12

2 Answers2

2

This function works for loading NetCDF files from a Google Cloud storage bucket:

import xarray as xr
import fsspec

def load_dataset(filename, engine="h5netcdf", *args, **kwargs) -> xr.Dataset:
    """Load a NetCDF dataset from local file system or cloud bucket."""
    with fsspec.open(filename, mode="rb") as file:
        dataset = xr.load_dataset(file, engine=engine, *args, **kwargs)
    return dataset

dataset = load_dataset("gs://bucket-name/path/to/file.nc")
Jack Kelly
  • 2,214
  • 2
  • 22
  • 32
0

I'm not sure how to work with Google object store, but here's how you can open a netCDF file from an in-memory buffer containing all the bytes from the file:

from netCDF4 import Dataset

fobj = open('path/to/netcdf.nc', 'rb')
data = fobj.read()
nc = Dataset('memory', memory=data)

So the path forward would be to read all the data from object store, then use that command to read it. That will have some drawbacks for large netcdf files because you're putting all those bytes in your system memory.

DopplerShift
  • 5,472
  • 1
  • 21
  • 20