Should I remove stopwords when feed sentence to RNN

Question

In bag-of-words model, I know we should remove stopwords and punctuation before training. But in RNN model, if I want to do text classification, should I remove stopwords too ?

score 2 · Accepted Answer · answered May 19 '16 at 17:45

2

This depends on what your model classifies. If you're doing something in which the classification is aided by stop words -- some level of syntax understanding, for instance -- then you need to either leave in the stop words or alter your stop list, such that you don't lose that information. For instance, cutting out all verbs of being (is, are, should be, ...) can mess up a NN that depends somewhat on sentence structure.

However, if your classification is topic-based (as suggested by your bag-of-words reference), then treat the input the same way: remove those pesky stop words before they burn valuable training time.

answered May 19 '16 at 17:45

Prune

76,765
14
60
81

That's not an elaboration; it's a separate question, dependent on your application specifics. Please raise it appropriately. – Prune Sep 26 '18 at 15:58
1

`If you working with LSTM’s or other models which capture the semantic meaning and the meaning of a word depends on the context of the previous text, then it becomes important not to remove stopwords.` https://towardsdatascience.com/why-you-should-avoid-removing-stopwords-aa7a353d2a52 – Abhijeet Dec 20 '19 at 06:51

score 0 · Answer 2 · answered Jun 27 '22 at 14:03

Do not remove SW, as they add new information(context-awareness) to the sentence (viz., text summarization, machine/language translation, language modeling, question-answering)
Remove SW if we want only general idea of the sentence (viz., sentiment analysis, language/text classification, spam filtering, caption generation, auto-tag generation, topic/document

Should I remove stopwords when feed sentence to RNN

2 Answers2