0

In bag-of-words model, I know we should remove stopwords and punctuation before training. But in RNN model, if I want to do text classification, should I remove stopwords too ?

Nils Cao
  • 1,409
  • 2
  • 15
  • 23

2 Answers2

2

This depends on what your model classifies. If you're doing something in which the classification is aided by stop words -- some level of syntax understanding, for instance -- then you need to either leave in the stop words or alter your stop list, such that you don't lose that information. For instance, cutting out all verbs of being (is, are, should be, ...) can mess up a NN that depends somewhat on sentence structure.

However, if your classification is topic-based (as suggested by your bag-of-words reference), then treat the input the same way: remove those pesky stop words before they burn valuable training time.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • That's not an elaboration; it's a separate question, dependent on your application specifics. Please raise it appropriately. – Prune Sep 26 '18 at 15:58
  • 1
    `If you working with LSTM’s or other models which capture the semantic meaning and the meaning of a word depends on the context of the previous text, then it becomes important not to remove stopwords.` https://towardsdatascience.com/why-you-should-avoid-removing-stopwords-aa7a353d2a52 – Abhijeet Dec 20 '19 at 06:51
0
  • Do not remove SW, as they add new information(context-awareness) to the sentence (viz., text summarization, machine/language translation, language modeling, question-answering)
  • Remove SW if we want only general idea of the sentence (viz., sentiment analysis, language/text classification, spam filtering, caption generation, auto-tag generation, topic/document
rohan goli
  • 116
  • 4