Comparing Similarity Between Two Texts with Doc2Vec

Question

I'm working on a Machine Learning project. I have some user data from an e-commerce website and I'm predicting future purchases. Actually my model is complete but I want to add a new feature to my dataframe.

I haven't used search terms data of users and I want to use them to improve my classification model.

I'm making purchase predictions for each main product categories(which are 12 of them). I have product data too.

I have collected every product name on every product category and seperated them based on categories.

So I have 12 huge text files(they have 500.000 words each in average) and a dataframe that holds all search terms for each user(about 10-50 words per user).

Finally, my question is can I vectorize these search terms of users and huge text files of categories for comparing them with like cosine distance and get a score that I can use in my classification dataframe?

For an example: I want to vectorize search terms of user 1472631 and compare it with vector of product category 6?

My concern is the huge product category text files.

Collected search terms used by every user and text files of product categories.

What should I use?

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community, Aug 14 '23 at 17:58
Welcome to Stackoverflow! Asking for recommendations might not be appropriate on the Stackoverflow (https://stackoverflow.com/help/how-to-ask) but it might be possible to ask the question on https://softwarerecs.stackexchange.com. Also, logging it on https://stackoverflow.com/collectives/nlp/beta/discussions/76949597 — alvas, Aug 25 '23 at 16:33
An easy way of doing it is using spacy: https://stackoverflow.com/questions/61870609/document-similarities-using-spacy-python You can also try one of these LLMs embeddings API and simply compute the cosine similarity between embeddings — João Areias, Aug 27 '23 at 00:03

Comparing Similarity Between Two Texts with Doc2Vec

0 Answers0