On occasion, circumstances require us to do the following:
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(num_words=my_max)
Then, invariably, we chant this mantra:
tokenizer.fit_on_texts(text)
sequences = tokenizer.texts_to_sequences(text)
While I (more or less) understand what the total effect is, I can't figure out what each one does separately, regardless of how much research I do (including, obviously, the documentation). I don't think I've ever seen one without the other.
So what does each do? Are there any circumstances where you would use either one without the other? If not, why aren't they simply combined into something like:
sequences = tokenizer.fit_on_texts_to_sequences(text)
Apologies if I'm missing something obvious, but I'm pretty new at this.