I am trying to produce tf-idf on plain corpus of about 200k tokens. I produced vector counter at first that term frequency. Then I produced tf-idf matrix and got following results. My code is
from sklearn.feature_extraction.text import TfidfVectorizer
with open("D:\history.txt", encoding='utf8') as infile:
contents = infile.readlines()
#define vectorizer parameters
tfidf_vectorizer = TfidfVectorizer(max_df=1.0, max_features=200000,
min_df=0.0,
use_idf=True, ngram_range=(1,3))
tfidf_matrix = tfidf_vectorizer.fit_transform(contents) #fit the vectorizer to contents
print(tfidf_matrix)
Results
(0, 8371) 0.0296607326158
(0, 27755) 0.159032195629
(0, 59369) 0.0871403881289
: :
(551, 64746) 0.0324104689629
(551, 10118) 0.0324104689629
(551, 9308) 0.0324104689629
While I want to get results in following form
(551, good ) 0.0324104689629