How does one transform (e.g., one-hot encode, index, bucketize, embed, etc.) labels natively in TensorFlow? tf.feature_column
is the preferred way for features, but what about the labels (i.e., targets)? Those too may often need to be transformed and treated as a layer in the overall Keras pipeline. The problem is that tf.feature_column
only acts on the features, not the labels.
Consider for example a CSV
F1 F2 T
3.7 2.0 A
1.7 3.5 B
6.0 6.6 A
0.7 3.2 A
where F1
and F2
are the features and T
the target. I'd then naturally call make_csv_dataset(..., label_name='T')
to generate my dataset. But then how do I transform the targets so that all data processing is neatly wrapped in a Dense
layer?
Has the tf.data
team at TensorFlow overlooked the fact that labels are often categorical and therefore need to be transformed?
EDIT: I would like to avoid any use of pandas since it's not scalable, hence my emphasis on the "native" tools of tf.data
(e.g., make_csv_dataset()
or otherwise).