0

In Python (using pytables), it is easy to create HDF5 tables with rows containing timestamps (column datatype Time64, see http://pytables.github.io/usersguide/datatypes.html).

Is it possible to read in tables containing columns with type Time64 in IDL 8.2? The default approach of

fid = H5F_OPEN(filename)
tabID = H5D_OPEN(fid, '/path/to/table')
data = H5D_READ(tabID)

seems to choke if the node /path/to/table contains a column of type Time64. I guess there is a way of converting/interpreting the datatype in IDL, even if it is not natively supported by IDL, or not? After all, the Time64 columns are just 8 byte values...

The most relevant IDL documentation that I could find was http://www.exelisvis.com/docs/HDF5_Overview.html.

As a side question: HDFView from the HDF5 Group seems to not support Time64 either, although a special 8-byte column type exists in HDF5 (sorry, I'm not allowed to post another link). Is this column type used by pytables somehow not a standard column type?


Edit: I have created an exemplary hdf5 file containing a table with a Time64 column, see comments for a link. The file was created with the following Python code:

import tables as T
import time

exampleTableColumns = {
    'id': T.Int32Col(pos=0),
    'value': T.Float32Col(pos=1),
    'timestamp': T.Time64Col(pos=2),
    }
with T.openFile('time64-example.h5', 'w') as h5:
    exampleTab = h5.createTable(
        '/', 'example', exampleTableColumns)
    # Add some test values
    t = time.time()
    for i in range(10):
        exampleTab.row['id'] = i
        exampleTab.row['value'] = i**2
        exampleTab.row['timestamp'] = t + 0.5*i
        exampleTab.row.append()
    exampleTab.flush()

My attempt at reading it from IDL is:

fid = h5f_open(filename)
exampleTab = H5D_OPEN(fid, '/example')
; id: 32 bit signed integer, value: float32, timestamp: 8 byte value
struct = {id:0L, value:0.0, timestamp:0LL}
dt = H5T_IDL_CREATE(struct)
exampleData = H5D_READ(exampleTab, dt)
print, 'exampleData.id:', exampleData.id
print, 'exampleData.value', exampleData.value
print, 'exampleData.timestamp', exampleData.timestamp
h5d_close, exampleTab
h5f_close, fid

The H5D_READ does not choke anymore once it gets a custom datatype, but already the values in the id and value field are garbled. This is the output I get from the print statements:

exampleData.id:           0           0           0  1095914052   174536304   153749104           0   172915600  1095914052   910565433
exampleData.value     0.000000     0.000000     0.000000      13.1451     0.000000     0.000000     0.000000     0.000000      13.1451      640.894
exampleData.timestamp                     0                     0                     0   3833484811918717440      5858206660165639             153997792
      5858318295760901             154274128   4051322254670378805      5858331130331138

If I change the struct to (what I would believe equivalent definition) struct = {id:lonarr(1), value:0.0, timestamp:0LL}, the print statements yield:

exampleData.id:   262404320           3   262404416           4   262404512          14           0   172915600  1095914052   910565433
exampleData.value     0.000000     0.000000     0.000000     0.000000     0.000000     0.000000     0.000000     0.000000      13.1451      640.894
exampleData.timestamp                     0                     0                     0   3833484811918717440                     0             153997568
      5858318295760901             154274128   4051322254670378805       791781549539330
bdoering
  • 169
  • 7
  • Can you put a file containing a Time64 column somewhere I can download? – mgalloy Apr 08 '14 at 20:11
  • Link to exemplary hdf5 file: http://ubuntuone.com/3agm00xGNP9nKbcYJn90oD – bdoering Apr 09 '14 at 09:15
  • How are the fields garbled? Can we see the `print` output? Do you know if `H5Tpack` has been called on the compounded type? – Timothy Brown Apr 09 '14 at 20:48
  • I wasn't aware of the HDF5 `H5Tpack` function before. I cannot find an equivalent in IDL though, so I assume that it is (should be) called implicitly. In any case, I would still expect to get back a reasonable bit pattern, so that a `struct` with a field of `LON64ARR(1)` or `LONARR(2)` yields the same bits (although they are differently interpreted). – bdoering Apr 10 '14 at 09:00

1 Answers1

0

I have figured out how to read the timestamp column from data file. Which involved reading the source for PyTables.

It looks like the column is actually written as a HDF5 type H5T_UNIX_D64LE. Also there is some bit-shifting going on.

The following (rtbl.py) is python version to dump the file time64-example.h5:

#!/usr/env/bin python

import tables as T
import time

exampleTableColumns = {
    'id': T.Int32Col(pos=0),
    'value': T.Float32Col(pos=1),
    'timestamp': T.Time64Col(pos=2),
    }
with T.openFile('time64-example.h5', 'r') as h5:
    for row in h5.root.example[:]:
        print "%d\t%.2f\t%s" %(row['id'], row['value'],
            time.ctime(row['timestamp']))

Running this we can get the following:

$ env TZ=UTC ./rtbl.py
0   0.00    Wed Apr  9 08:56:24 2014
1   1.00    Wed Apr  9 08:56:25 2014
2   4.00    Wed Apr  9 08:56:25 2014
3   9.00    Wed Apr  9 08:56:26 2014
4   16.00   Wed Apr  9 08:56:26 2014
5   25.00   Wed Apr  9 08:56:27 2014
6   36.00   Wed Apr  9 08:56:27 2014
7   49.00   Wed Apr  9 08:56:28 2014
8   64.00   Wed Apr  9 08:56:28 2014
9   81.00   Wed Apr  9 08:56:29 2014

Then to do the same in C (rtbl.c) -- please excuse all the hard-coding and no error checking.

#include <stdlib.h>
#include <stdio.h>
#include <hdf5.h>
#include <time.h>

struct row {
    int id;
    float value;
    double time;
};

double
convert(double d) {
    union {
        double d;
        long long i;
    } di;
    double f;

    di.d = d;
    f = 1e-6 * (int)di.i + (di.i >> 32);
    return f;
}


int
main(int argc, char **argv)
{

    hid_t f_id;
    hid_t d_id;
    hid_t m_type;
    herr_t err;
    int i = 0;
    struct row rows[10] = {0};
    time_t clock;

    f_id = H5Fopen("time64-example.h5", H5F_ACC_RDONLY, H5P_DEFAULT);

    m_type = H5Tcreate(H5T_COMPOUND, sizeof(struct row));
    err = H5Tinsert(m_type, "id",        HOFFSET(struct row, id),
            H5T_STD_I32LE);
    err = H5Tinsert(m_type, "value",     HOFFSET(struct row, value),
            H5T_IEEE_F32LE);
    err = H5Tinsert(m_type, "timestamp", HOFFSET(struct row, time),
            H5T_UNIX_D64LE);

    d_id = H5Dopen(f_id, "example", H5P_DEFAULT);
    err = H5Dread(d_id, m_type, H5S_ALL, H5S_ALL, H5P_DEFAULT, rows);

    for (i = 0; i < 10; ++i) {
        clock = (time_t)convert(rows[i].time);
        printf("%d: %d\t%.2f\t%s", i,
                rows[i].id, rows[i].value, ctime(&clock));
    }

    H5Dclose(d_id);
    H5Fclose(f_id);

    return(EXIT_SUCCESS);
}

This yields the same result:

$ h5cc -o rtbl rtbl.c && env TZ=UTC ./rtbl
0   0.00    Wed Apr  9 08:56:24 2014
1   1.00    Wed Apr  9 08:56:25 2014
2   4.00    Wed Apr  9 08:56:25 2014
3   9.00    Wed Apr  9 08:56:26 2014
4   16.00   Wed Apr  9 08:56:26 2014
5   25.00   Wed Apr  9 08:56:27 2014
6   36.00   Wed Apr  9 08:56:27 2014
7   49.00   Wed Apr  9 08:56:28 2014
8   64.00   Wed Apr  9 08:56:28 2014
9   81.00   Wed Apr  9 08:56:29 2014

So it all comes back to the convert routine and H5T_UNIX_D64LE. I dare say that if any of this was done on a big-endian then the LE's should be BE's.

I'm sorry it has been years since I have used IDL and don't have access to it, but I hope this helps.

Timothy Brown
  • 2,220
  • 18
  • 22
  • Thank you very much for your long answer! It does clear up some of the details of how pytables writes the data in the first place. Nevertheless, I don't see how this solution can help in reading the data from _IDL_. After all, it might be an idea to just incorporate methods written in other languages (C++ or Java) to read (some) HDF5 data in IDL. – bdoering Apr 11 '14 at 21:01
  • Yes, I think the issue is convincing _IDL_ how to read `H5T_UNIX_D64LE`. The `convert` you can easily code in _IDL_. Good luck. – Timothy Brown Apr 11 '14 at 21:27