1

When trying to get the label of a column in order to one hot encode it by using tensorflow:

import tensorflow as tf
import pandas as pd
import numpy as np

# some data
d={'column1':['a', 'b', 'c', 'd'], 'column2':['e', 'f', 'g', 'h'], 'column3':[1, 2, 3, 4]}


# convert from pandas df to TensorSliceDataset
df=pd.DataFrame(d)
ds = tf.data.Dataset.from_tensor_slices(dict(df))

# convert to specific feature_column
just_types = tf.feature_column.categorical_column_with_vocabulary_list(
      'column1', ds.column1.unique())

# apply one hot encoding
type_one_hot = feature_column.indicator_column(just_types)
type_one_hot

The next error arises:

AttributeError: 'TensorSliceDataset' object has no attribute 'column1'

I know this is possible with pandas but is it possible to get a dataframe in tensorflow and then change it to pandas again in a way that kinda looks like this by using tensorflow? :

#   column1_a  column1_b  column_c  column_d
#       1           0          0        0
#       0           1          0        0
#       0           0          1        0
#       0           0          0        1
AloneTogether
  • 25,814
  • 5
  • 20
  • 39
AlSub
  • 1,384
  • 1
  • 14
  • 33
  • 1
    Can you explain a bit more what exactly you would like to achieve? – AloneTogether Oct 15 '21 at 14:03
  • Yes, I wonder if it is possible to one-hot-encode a specific pandas df column with tensorflow and then reconvert it from tensorflow to pandas – AlSub Oct 15 '21 at 15:49

1 Answers1

2

Using function from_tensor_slices in my opinion is an overkill in this example. Just to point out two functions

import tensorflow_datasets as tfds

df=pd.DataFrame(data)
ds = tf.data.Dataset.from_tensor_slices(dict(df))

# going back to pandas DataFrame

df_reversed = tfds.as_dataframe(ds)

And CategoryEncoding option

layer = tf.keras.layers.CategoryEncoding(num_tokens=4, output_mode="one_hot")
layer([1,2,0, 3])
Damir Devetak
  • 726
  • 4
  • 10