from keras.preprocessing.text import one_hot equivalent in pytorch?

Question

I just started using pytorch for NLP. I found a tutorial that uses from keras.preprocessing.text import one_hot and converts text to one_hot representation given a vocabulary size.

For example:

The input is

vocab_size = 10000
sentence = ['the glass of milk',
            'the cup of tea',
            'I am a good boy']

onehot_repr = [one_hot(words, vocab_size) for words in sentence]

The output is"

[[6654, 998, 8896, 1609], [6654, 998, 1345, 879], [123, 7653, 1, 5678,7890]]

how can i perform the same procedure in pytorch and get the output like above.

score 1 · Answer 1 · answered May 12 '21 at 13:18

PyTorch fundamentally works with Tensors, and is not designed to work with strings. You can use SK Learn's LabelEncoder to encode your words however:

from sklearn import preprocessing

le = preprocessing.LabelEncoder()
le.fit([w for s in sentence for w in s.split()])

onehot_repr = [le.transform(s.split()) for s in sentence]

>>> [array([10,  5,  8,  7]), array([10,  4,  8,  9]), array([0, 2, 1, 6, 3])]

from keras.preprocessing.text import one_hot equivalent in pytorch?

1 Answers1