2

I am feeding my discord server messages into an RNN, so that i can create a chatbot based on those messages. I know tensorflow's tf.keras.preprocessing.text.Tokenizer can tokenize on a character level, but I wanted to include special tokens, since I want the bot to simulate a person writing multiple messages on discord pressing enter multiple times for each phrase. An example of a sentence would be, with special tokens: '<START> im a riot <ENTER> ok <ENTER> lets see here <END> '

How can I include special tokens like this in this situation? So far the only way i've found is to use the regex method re.findall to separate characters and special tokens (re.findall(r'(?:(?:<[\w]+?>)|(?:[\w.,?!:]))), however, it is slow and I would prefer some sort of tensorflow method to make it portable and to be able to use graph execution on tf.data Datasets.

Elysium
  • 339
  • 3
  • 10

0 Answers0