-1


I need to perform binary search on a frozenset, but as indexing doesn't work on frozenset, I cannot use the bisect library. I thought of converting the frozenset to a list to make things easy, but the problem is that the conversion (list(frozenset)) disarranges the order and then I cannot perform binary search. What solution do you suggest?
Just to be more clear, let me explain what exactly I'm doing: In an NLP task, I need to remove stopwords from my text, so I have imported the stopwords from scikit-learn (it has a better collection of stopwords than NLTK in my opinion):
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS
And it returns a frozenset in which the stopwords are in alphabetical order. And now that I want to remove stopwords from my text, it's better to check if a token is in the stopwords using binary search (obviously because I have stopwords in alphabetical order and it's efficient to perform binary search). So it is as follows:

import bisect

bisect.bisect(ENGLISH_STOP_WORDS, word)

And this is where I'm stuck! I was expecting to find the desired index in stopwords list with the above code, and then compare my word with the one before and after it in the list. But I get this error: TypeError: 'frozenset' object does not support indexing.

FYI, I have not tried other libraries stopwords list (spaCy, gensim, etc.), so I don't know if they work better in this case. But the main point here is to learn handling the binary search on the frozenset. Thanks in advance.

  • 1
    `it returns a frozenset in which the stopwords are in alphabetical order` is a surprising sentence. Sets and frozensets are *unordered* collections – Sylvaus May 27 '20 at 13:52
  • 1
    You don't NEED to do a binary search on a set. Sets directly support efficient membership testing via the `in` operator, that's the whole point of them! – jasonharper May 27 '20 at 13:53
  • @jasonharper I didin't know this fact. Thank you for the point. – Arash Ashrafzadeh May 27 '20 at 14:27
  • For those interested, I found [this video](https://www.youtube.com/watch?v=C4Kc8xzcA68) sent to me by my friend @amirhossein really helpful. – Arash Ashrafzadeh May 28 '20 at 06:16

1 Answers1

3

If you want to know if the word is a stop word, simply do:

if word in ENGLISH_STOP_WORDS:
    pass
Sylvaus
  • 844
  • 6
  • 13