I want to train my word2vec models on the hpc cluster provided through my university. However, I have been told that in order to optimize storage on the cluster, I must transform my data into HDF5 and upload that data instead into the cluster. My data consists of txt files (basically the txt files I want to train word2vec on). How am I supposed to transform txt files into HDF5 ?
I am surfing the documentation but cannot seem to find a tool for txt files, or should I write a certain script ?