2

I am trying to read netCDF files placed in my S3 bucket, I am using Xarray to read the files. Below sample code runs fine, if I have the same file in my local folder like ~/downloads/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc but i am new to S3 and not sure what am i missing.

I am trying to read netCDF via Xarray and convert it to csv. Boto3 doesn`t work for reading netCDF4 and converting it to CSV.

Below is my lambda function: -

import xarray

def handler(event, context):
    
    filename = 's3://netcdf-files/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc'
    ds= xarray.open_dataset(filename)
    for varname in ds:
        print(varname)

    tas0=ds['wet_bulb_potential_temperature']
    tas0

    return {
        'statusCode': 200,
        'message': 'Hello from Python Lambda Function!'
    }

I am getting below error, my S3 file path isn`t detected instead its Lambda is trying to find the file in local path. Error message from cloud watch logs:

File "/opt/python/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 204, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success

FileNotFoundError: [Errno 2] No such file or directory: b'/var/task/s3:/netcdf-files/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc' 
jizhihaoSAMA
  • 12,336
  • 9
  • 27
  • 49
data099
  • 21
  • 1
  • 2

1 Answers1

3

EDIT 2021

From v0.16.2 S3 buckets are supported using general fsspec URLs: http://xarray.pydata.org/en/stable/user-guide/io.html#cloud-storage-buckets


Old answer

If you need to use older version, you can use s3fs instead:

import xarray
import s3fs

def handler(event, context):
    
    fs = s3fs.S3FileSystem(anon=True) # or anon=False to use default credentials

    with fs.open('netcdf-files/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc', 'rb') as f:
        ds= xarray.open_dataset(filename)
        for varname in ds:
            print(varname)

    tas0=ds['wet_bulb_potential_temperature']
    tas0

    return {
        'statusCode': 200,
        'message': 'Hello from Python Lambda Function!'
    }
Pörripeikko
  • 839
  • 7
  • 6
  • 2
    I could be wrong but I think the statement "From v0.16.2 S3 buckets are supported" is only true for loading Zarr datasets, not NetCDF files. – Jack Kelly Feb 03 '22 at 15:03
  • Indeed that appears to be the case reading the second bullet point from the end at https://docs.xarray.dev/en/stable/whats-new.html#id179 – ogb119 Aug 04 '22 at 14:16