You can invert a lemmatise
function by applying it to every word in the Scrabble dictionary, and grouping words with a common stem in a python dict.
Of course the groups will strongly depend on the lemmatise
function you have. Below, I use nltk.stem.WordNetLemmatizer.lemmatize
, which correctly groups 'science'
and 'sciences'
under the same stem 'science'
, but doesn't group 'scientific'
with them.
So you'll need a more "brutal" lemmatise function that brings more words to the same stem.
import nltk
from nltk.stem import WordNetLemmatizer
wnl = WordNetLemmatizer()
d = {}
with open('scrabble_dict.txt', 'r') as f:
next(f); next(f) # skip header
for word in f:
word = word.strip().lower()
d.setdefault(wnl.lemmatize(word), []).append(word)
print(d['science'])
# ['science', 'sciences']
print(d['scientific'])
# ['scientific']
print([stem for stem in d if stem.startswith('scien')])
# ['science', 'scienced', 'scient', 'scienter', 'sciential', 'scientific', 'scientifical', 'scientifically', 'scientificities', 'scientificity', 'scientise', 'scientised', 'scientises', 'scientising', 'scientism', 'scientisms', 'scientist', 'scientistic', 'scientize', 'scientized', 'scientizes', 'scientizing']
print(d['lemma'])
# ['lemma', 'lemmas', 'lemmata']