1

I am trying to read a HDF4 file (https://www.dropbox.com/s/5d40ukfsu0yupwl/MOD13A2.A2016001.h23v05.006.2016029070140.hdf?dl=0).

import os
import numpy as np
from pyhdf.SD import SD, SDC

# Open file.
FILE_NAME = 'MOD13A2.A2016001.h23v05.006.2016029070140.hdf'
hdf = SD(FILE_NAME, SDC.READ)

# List available SDS datasets.
print (hdf.datasets())

# Read dataset.
DATAFIELD_NAME="1_km_16_days_NDVI"
data2D = hdf.select(DATAFIELD_NAME)
data = data2D[:,:] 

When I executed this script, I am getting following error: Traceback (most recent call last): File "Test.py", line 15, in data2D = hdf.select(DATAFIELD_NAME) File "C:\Python35\lib\site-packages\pyhdf\SD.py", line 1599, in select raise HDF4Error("select: non-existent dataset") pyhdf.error.HDF4Error: select: non-existent dataset

I have used similar python code for reading other HDF4 files and it works well. But I am unable to understand problem in this case.

Yogesh Sathe
  • 41
  • 1
  • 7

1 Answers1

0

If you look carefully at the output of print(hdf.datasets()) you would notice the following line:

 ...
 '1 km 16 days NDVI': (('YDim:MODIS_Grid_16DAY_1km_VI',
                        'XDim:MODIS_Grid_16DAY_1km_VI'),
                       (1200, 1200),
                       22,
                       0),
 ...

The key is the name of the dataset. Note that the name uses spaces to separate words, not the underscores as in your example.

If you replace the DATAFIELD_NAME with DATAFIELD_NAME='1 km 16 days NDVI'. That is replace underscores with spaces in the dataset name, your code would work without the error.

This is enough to access the data within the tile, but geolocation requires more work. At the moment you already can visualize the data with

from matplotlib import pyplot as plt
plt.imshow(data)
plt.colorbar()
plt.show()

enter image description here

What remains to be done is

  • scaling to scientific units
  • geolocation

As an aside, do use the pprint module to output large dictionaries with complex structure, i.e. replace print(hdf.datasets()) with

import pprint
...
pprint.pprint(hdf.datasets())
Dima Chubarov
  • 16,199
  • 6
  • 40
  • 76
  • Thanks for the reply. It works fine now. I am facing another problem with this dataset. I am unable to extract the geolocation fields (i.e. Lat/Lon) from this file. When I opened HDF file in Panoply software it shows XDim and YDim arrays but pprint outptut or HDFView software doesnt show these arrays. Could you please help me on this? – Yogesh Sathe Apr 06 '18 at 08:21
  • @YogeshSathe HDF-EOS tile datafiles do not carry geolocation arrays. Since the data is located on a regular grid (in MODIS sinusoidal projection), you can restore geolocation of each pixel from the coordinates of the tile's corners that can be read from `CoreMetadata.0` and `StructMetadata.0` attributes. Checkout the output from `print(getattr(hdf,'CoreMetadata.0'))` and `print(getattr(hdf,'StructMetadata.0'))`. – Dima Chubarov Apr 06 '18 at 09:08
  • Thank you once again for your reply. I tried the way suggested by you but I am unable to extract the geolocation. Could you please give an example/link of extracting geolocation for each pixel in python – Yogesh Sathe Apr 06 '18 at 10:00
  • 1
    After searching in your direction I got this example which reads lat/lon also for specified file [link] (http://hdfeos.org/zoo/LPDAAC/MOD13A1_500m_16_days_EVI.py) – Yogesh Sathe Apr 06 '18 at 10:13
  • @YogeshSathe this example relies on external tools, either GDAL or `eos2dump`. IMHO it is a shame pyhdf development stopped short from doing the calculations entirely in Python. – Dima Chubarov Apr 06 '18 at 10:16
  • Hi.... I have wrote python code to extract Lat/Long and NDVI data from MODIS Sinusoidal projection HDF file using GDAL and PYPROJ. – Yogesh Sathe Jun 01 '18 at 07:11