I have a list of strings (words like), and, while I am parsing a text, I need to check if a word belongs to the group of words of my current list.
However, my input is pretty big (about 600 millions lines), and checking if an element belongs to a list is a O(n) operation according to the Python documentation.
My code is something like:
words_in_line = []
for word in line:
if word in my_list:
words_in_line.append(word)
As it takes too much time (days actually), I wanted to improve that part which is taking most of the time. I have a look at Python collections, and, more precisely, at deque. However, the only give a O(1) operation time access to the head and the tail of a list, not in the middle.
Do someone has an idea about how to do that in a better way?