I want to do n-grams method but letter by letter
Normal N-grams:
sentence : He want to watch football match
result:
he, he want, want, want to , to , to watch , watch , watch football , football, football match, match
I want to do this but letter by letter:
word : Angela
result:
a, an, n , ng , g , ge, e ,el, l , la ,a
This is my code using Sklearn
, but it is still word-by-word not letter-by-letter:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(ngram_range=(1, 100),token_pattern = r"(?u)\b\w+\b")
corpus = ['Angel','Angelica','John','Johnson']
X = vectorizer.fit_transform(corpus)
analyze = vectorizer.build_analyzer()
print(vectorizer.get_feature_names())
print(vectorizer.transform(['Angela']).toarray())