I have to preprocess NLP data, so I've to remove the stopwords (from nltk library) from a Tensorflow dataset. I tried many thing like this:
docs = tf.data.Dataset.from_tensor_slices([['Never tell me the odds.'], ["It's a trap!"]])
tokenizer = text.WhitespaceTokenizer()
tokenized_docs = docs.map(lambda x: tokenizer.tokenize(x))
data = tokenized_docs.filter(lambda x: x. not in stop_words)
or this:
tokens = docs.map(lambda x: tokenizer.tokenize(x))
data = tokens.filter(lambda x: tf.strings.strip(x).ref() not in stopwords)
But it didn't work. This first code shows an error like: RaggedTensor is unhashable.