-1

I have the following two datasets (I have several of these tuples):

filename_string: "something"
filename_list: [1,2,3,4,5] # this is a numpy array.

Id like to know how to write this in a compact format via h5py. The goal is to have the end user read this h5 datafile and be able to deduce the list and its corresponding filename.

I am able to efficiently write the numpy list to h5, but strings seems to be a big problem and errors out when I include this.

Any help would be great - wasted a few hours looking for a solution!

JohnJ
  • 6,736
  • 13
  • 49
  • 82
  • I can imagine naming a `dataset` "something". Or assigning the `filename_string` as an attribute of the dataset. – hpaulj Jul 28 '20 at 16:34
  • Normally when people have errors, we expect to see the problem code and the full error message. It's usually easier to help with specific problems, than to suggest a whole new approach that the poster might have already tried. – hpaulj Jul 28 '20 at 16:37
  • Ditto on "share your code". How are you writing the string? As an Attribute? or in a array with string dtype or a record array (field dtype is a string)? – kcw78 Jul 28 '20 at 21:39

1 Answers1

0

This little scrap of code will create a dataset named something (from the variable filename_string) that contains the data in your list filename_list.

import h5py
filename_string= "something"
filename_list= [1,2,3,4,5]

with h5py.File('SO_63137136.h5','w') as h5f:
    h5f.create_dataset(filename_string, data=filename_list)
kcw78
  • 7,131
  • 3
  • 12
  • 44
  • yes, this is what I did eventual;ly.. I was iniitially attempting a harder challenge: write a tuple in HD5 which looks like so:` ("the_string", [the_list_elements])` anyways, this seems like it is not possible. Thanks for your answer! – JohnJ Aug 02 '20 at 13:31
  • If you reorganize your data, you can write it as a HDF5 dataset. HDF5 supports mixed datatypes in a dataset (as a NumPy recarray). This assumes the field that holds "the_string" can be a fixed string length, and the [the_list_elements] can be converted to a NumPy array (of common dtype: int, float or string). – kcw78 Aug 03 '20 at 13:55