I have a Pandas DF where 1 column contains a single int, and the other column contains anywhere from 2 to 50 ints.
Here is an example below
EmbedID MappedC
1911 3096611 [610580, 1396024, 1383000, 2480745, 751823, 97...
1912 3096612 [365607, 917990]
1913 3096613 [1067171, 638200, 2192752, 1609109, 1984544, 3...
1914 3096614 [521163, 217279, 347655]
1915 3096615 [1139429, 1254616, 3034840, 2312074, 68243]
The numbers EmbedID
serves as the label, and two random numbers chosen from the MappedC
column serves as the corresponding input numbers .
What's the best way to convert this into a tf.record file?
I see guides for converting a single numpy column to a tf.record file, such as these
https://gist.github.com/swyoon/8185b3dcf08ec728fb22b99016dd533f
Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords?
http://www.machinelearninguru.com/deep_learning/tensorflow/basics/tfrecord/tfrecord.html
However, they all have trouble when the column / array has a varying number of ints.
Edit:
If this changes anything, here is more details about what exactly I am doing with the data.
For training on Tensorflow, the single int column contains an index for a vector in an embedding matrix. That vector will be used as the label.
The column with multiple ints have the 'input data'. For each label from the column containing a single int, 2 numbers will be chosen at random from the column containing multiple ints.
I am basically doing a word2vec Cbow type of training