I'm working on a Machine Learning project. I have some user data from an e-commerce website and I'm predicting future purchases. Actually my model is complete but I want to add a new feature to my dataframe.
I haven't used search terms data of users and I want to use them to improve my classification model.
I'm making purchase predictions for each main product categories(which are 12 of them). I have product data too.
I have collected every product name on every product category and seperated them based on categories.
So I have 12 huge text files(they have 500.000 words each in average) and a dataframe that holds all search terms for each user(about 10-50 words per user).
Finally, my question is can I vectorize these search terms of users and huge text files of categories for comparing them with like cosine distance and get a score that I can use in my classification dataframe?
For an example: I want to vectorize search terms of user 1472631 and compare it with vector of product category 6?
My concern is the huge product category text files.
Collected search terms used by every user and text files of product categories.
What should I use?