Trying to search 2 lists for common strings. 1-st list being a file with text, while the 2-nd is a list of words with logarithmic probability before the actual word – to match, a word not only needs to be in both lists, but also have a certain minimal log probability (for instance, between -2,123456 and 0,000000; that is negative 2 increasing up to 0). The tab separated list can look like:
-0.962890 dog
-1.152454 lol
-2.050454 cat
I got stuck doing something like this:
common = []
for i in list1:
if i in list2 and re.search("\-[0-1]\.[\d]+", list2):
common.append(i)
The idea to simply preprocess the list to remove lines under a certain threshold is valid of course, but since both the word and its probability are on the same line, isn’t a condition also possible? (Regexps aren’t necessary, but for comparison solutions both with and without them would be interesting.)
EDIT: own answer to this question below.