I am building a course recommendation system. The idea is to build a personalize system that users can see all the course recommended by the system when they login.
All the data I have is four columns:
username
(str)position
(str, for exp: data engineer)skill
(str, for exp: python, java, machine learning)
I have already made a model based on Content-based that using cosine similarity to detect the nearest neighbor based on the combine of two columns (position
and skill
).
Please look at the code below:
count = CountVectorizer(token_pattern=r"(?u)\b\w+\b",
stop_words=None, ngram_range=(2,2), analyzer='word')
count_matrix = count.fit_transform(user['keyword'])
cosine_sim = cosine_similarity(count_matrix, count_matrix)
def rec(role):
idx = indices[role]
sim_scores = list(enumerate(cosine_sim[(idx)]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
sim_scores = sim_scores[1:26]
user_indices = [i[0] for i in sim_scores]
users = user.iloc[user_indices][['real-skill']]
return users.head(10)
The question is that I need a metrics to evaluate my model or event dataset. I tried reaching to many libraries or other metrics that are useful for RS system is MSE, RMSE, Novelty,...
However, it seems that without more data from users preferences the metrics won't be evaluated. So, there are anyways to measure the accuracy with my simple system?
Thank you!