2

I am trying to read hdf files, over https connection, from the Harmonized Landsat Sentinel repository (here: https://hls.gsfc.nasa.gov/data/v1.4/

Ideally, I would use xarray to do this. Here is an example:

Example of https:

xr.open_rasterio('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')

<xarray.DataArray (band: 1, y: 3660, x: 3660)>
[13395600 values with dtype=int16]
Coordinates:
  * band     (band) int64 1
  * y        (y) float64 4.6e+06 4.6e+06 4.6e+06 ... 4.49e+06 4.49e+06 4.49e+06
  * x        (x) float64 5e+05 5e+05 5.001e+05 ... 6.097e+05 6.097e+05 6.098e+05
Attributes:
    transform:                 (30.0, -0.0, 499980.0, -0.0, -30.0, 4600020.0)
    crs:                       +init=epsg:32613
    res:                       (30.0, 30.0)
    is_tiled:                  0
    nodatavals:                (nan,)
    scales:                    (1.0,)
    offsets:                   (0.0,)
    bands:                     1
    byte_order:                0
    coordinate_system_string:  PROJCS["UTM_Zone_13N",GEOGCS["GCS_WGS_1984",DA...
    data_type:                 2
    description:               HDF Imported into ENVI.
    file_type:                 HDF Scientific Data
    header_offset:             0
    interleave:                bsq
    lines:                     3660
    samples:                   3660

Note these files have multiple datasets/bands, so the above is incorrect.

xr.open_dataset('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    194             try:
--> 195                 file = self._cache[self._key]
    196             except KeyError:

/opt/conda/lib/python3.7/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
     42         with self._lock:
---> 43             value = self._cache[key]
     44             self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-85-7765ae565af3> in <module>
----> 1 xr.open_dataset('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')

/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
    497         if engine == "netcdf4":
    498             store = backends.NetCDF4DataStore.open(
--> 499                 filename_or_obj, group=group, lock=lock, **backend_kwargs
    500             )
    501         elif engine == "scipy":

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    387             netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    388         )
--> 389         return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
    390 
    391     def _acquire(self, needs_lock=True):

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __init__(self, manager, group, mode, lock, autoclose)
    333         self._group = group
    334         self._mode = mode
--> 335         self.format = self.ds.data_model
    336         self._filename = self.ds.filepath()
    337         self.is_remote = is_remote_uri(self._filename)

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in ds(self)
    396     @property
    397     def ds(self):
--> 398         return self._acquire()
    399 
    400     def open_store_variable(self, name, var):

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _acquire(self, needs_lock)
    390 
    391     def _acquire(self, needs_lock=True):
--> 392         with self._manager.acquire_context(needs_lock) as root:
    393             ds = _nc4_require_group(root, self._group, self._mode)
    394         return ds

/opt/conda/lib/python3.7/contextlib.py in __enter__(self)
    110         del self.args, self.kwds, self.func
    111         try:
--> 112             return next(self.gen)
    113         except StopIteration:
    114             raise RuntimeError("generator didn't yield") from None

/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock)
    181     def acquire_context(self, needs_lock=True):
    182         """Context manager for acquiring a file."""
--> 183         file, cached = self._acquire_with_cache_info(needs_lock)
    184         try:
    185             yield file

/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    199                     kwargs = kwargs.copy()
    200                     kwargs["mode"] = self._mode
--> 201                 file = self._opener(*self._args, **kwargs)
    202                 if self._mode == "w":
    203                     # ensure file doesn't get overriden when opened again

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -90] NetCDF: file not found: b'https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf'

When read from disc:

xr.open_rasterio('HLS.S30.T13TEF.2017002.v1.4.hdf')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-f4ae5075928a> in <module>
----> 1 xr.open_rasterio('HLS.S30.T13TEF.2017002.v1.4.hdf')

/opt/conda/lib/python3.7/site-packages/xarray/backends/rasterio_.py in open_rasterio(filename, parse_coordinates, chunks, cache, lock)
    250     # Get bands
    251     if riods.count < 1:
--> 252         raise ValueError("Unknown dims")
    253     coords["band"] = np.asarray(riods.indexes)
    254 

ValueError: Unknown dims

and

xr.open_dataset('/home/rowangaffney/Desktop/HLS.S30.T13TEF.2017002.v1.4.hdf')

<xarray.Dataset>
Dimensions:  (XDim_Grid: 3660, YDim_Grid: 3660)
Dimensions without coordinates: XDim_Grid, YDim_Grid
Data variables:
    B01      (YDim_Grid, XDim_Grid) float32 ...
    B02      (YDim_Grid, XDim_Grid) float32 ...
    B03      (YDim_Grid, XDim_Grid) float32 ...
    B04      (YDim_Grid, XDim_Grid) float32 ...
    B05      (YDim_Grid, XDim_Grid) float32 ...
    B06      (YDim_Grid, XDim_Grid) float32 ...
    B07      (YDim_Grid, XDim_Grid) float32 ...
    B08      (YDim_Grid, XDim_Grid) float32 ...
    B8A      (YDim_Grid, XDim_Grid) float32 ...
    B09      (YDim_Grid, XDim_Grid) float32 ...
    B10      (YDim_Grid, XDim_Grid) float32 ...
    B11      (YDim_Grid, XDim_Grid) float32 ...
    B12      (YDim_Grid, XDim_Grid) float32 ...
    QA       (YDim_Grid, XDim_Grid) float32 ...
Attributes:
    PRODUCT_URI:                                       S2A_MSIL1C_20170102T17...
    L1C_IMAGE_QUALITY:                                 SENSOR:PASSED GEOMETRI...
    SPACECRAFT_NAME:                                   Sentinel-2A
    TILE_ID:                                           S2A_OPER_MSI_L1C_TL_SG...
    DATASTRIP_ID:                                      S2A_OPER_MSI_L1C_DS_SG...
    PROCESSING_BASELINE:                               02.04
    SENSING_TIME:                                      2017-01-02T17:58:23.575Z
    L1_PROCESSING_TIME:                                2017-01-02T21:41:37.84...
    HORIZONTAL_CS_NAME:                                WGS84 / UTM zone 13N
    HORIZONTAL_CS_CODE:                                EPSG:32613
    NROWS:                                             3660
    NCOLS:                                             3660
    SPATIAL_RESOLUTION:                                30
    ULX:                                               499980.0
    ULY:                                               4600020.0
    MEAN_SUN_ZENITH_ANGLE(B01):                        65.3577462333765
    MEAN_SUN_AZIMUTH_ANGLE(B01):                       165.01162242158
    MEAN_VIEW_ZENITH_ANGLE(B01):                       8.10178275092502
    MEAN_VIEW_AZIMUTH_ANGLE(B01):                      285.224586475702
    spatial_coverage:                                  89
    cloud_coverage:                                    72
    ACCODE:                                            LaSRCS2AV3.5.5
    arop_s2_refimg:                                    NONE
    arop_ncp:                                          0
    arop_rmse(meters):                                 0.0
    arop_ave_xshift(meters):                           0.0
    arop_ave_yshift(meters):                           0.0
    HLS_PROCESSING_TIME:                               2018-02-24T18:17:49Z
    NBAR_Solar_Zenith:                                 44.82820466504637
    AngleBand:                                         [ 0  1  2  3  4  5  6 ...
    MSI band 01 bandpass adjustment slope and offset:  0.995900, -0.000200
    MSI band 02 bandpass adjustment slope and offset:  0.977800, -0.004000
    MSI band 03 bandpass adjustment slope and offset:  1.005300, -0.000900
    MSI band 04 bandpass adjustment slope and offset:  0.976500, 0.000900
    MSI band 8a bandpass adjustment slope and offset:  0.998300, -0.000100
    MSI band 11 bandpass adjustment slope and offset:  0.998700, -0.001100
    MSI band 12 bandpass adjustment slope and offset:  1.003000, -0.001200
    StructMetadata.0:                                  GROUP=SwathStructure\n.

Any idea on best practices for reading these data over https?

Thanks!

Rowan_Gaffney
  • 452
  • 5
  • 17

1 Answers1

0

I recommend reading http://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud to understand why it's not as easy as it seems (to access HDF5 files directly from https). So not exactly a solution, but you'll probably need to download the data and load it from there (in the short term at least).

Oh, and you might want to try using the 'h5netcdf' engine to read the file instead:

xr.open_dataset("HLS.S30.T13TEF.2017002.v1.4.hdf", engine="h5netcdf")

and if you're interested in just one band, do something like this:

xr.open_dataset("HLS.S30.T13TEF.2017002.v1.4.hdf", engine="h5netcdf", group="B01")

Just a note for others though, the below code would work in some cases if you use xarray with the 'h5netcdf' engine, have installed the 'h5pyd' library and the URL is stored on a HDF REST API interface:

xr.open_dataset(
    "https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf",
    engine="h5netcdf",
)

But unfortunately, that's not quite the case with these NASA datasets...

weiji14
  • 477
  • 7
  • 14