I think you're looking for something like tf.VarLenFeature(), more specifically, you do not necessarily have to pad your rows prior to creating the tfrecord file. You can create the tf_example,
from tensorflow.train import BytesList, Feature, Features, Example, Int64List
tf_example = Example(
features=Features(
feature={
"my_feature": Feature(
int64_list=Int64List(value=[0,3,43,223,23])
)
})
)
)
with TFRecordWriter(tfrecord_file_path) as tf_writer:
tf_writer.write(tf_example.SerializeToString())
Do this for all of your rows, that can vary in length.
You'll parse the tf_examples with something like,
def parse_tf_example(example):
feature_spec = {
"my_feature": tf.VarLenFeature(dtype=tf.int64)
}
return tf.parse_example([example], features=feature_spec)
Now, this will return your features as tf.SparseTensors, if you don't want to deal with that at this stage, and carry on using tensor ops as you would normally, you can simply use tf.sparse_tensor_to_dense() and carry on as you normally would with tensors.
The returned dense tensors will be of varying lengths, so you shouldn't have to worry about selecting '-1's, there won't be any. Unless you convert the sparse tensors to dense in batches, in that case the batches will be padded to the length of the longest tensor in the batch, and the padding value can be set by the default_value
parameter.
That is in so far as your question about using varying length rows in tfrecords and getting back varying length tensors.
With regards to the lookup op, I haven't used it myself, but I think tf.nn.embedding_lookup_sparse() might help you out here, it offers the ability to lookup the embeddings from the sparse tensor, forgoing the need to convert it to a dense tensor first, and also has a combiner
parameter to specify a reduction op on those embeddings, which in your case would be 'mean'.
I hope this helps in some way, good luck.