So I have a dataset which has one description column (an IT trouble ticket description) and one target column (grouping of the ticket e.g. ticket belongs to Group 0 or Group 1 - the group type e.g. access issues is not provided).
The thing is: I have 45 different target variables - targets are Group 0, Group 1,...... Group 45. There is a pretty long tail with some of these group having less than 0.1% of the total tickets. Now instead of just directly clubbing them together to form a single group, I wanted to see if there was any way to club these smaller groups with other groups which are 'similar' to them based on the IT trouble ticket description. For example, if a larger group has tickets describing access issues and a smaller group has tickets pertaining to login issues (depending on the text description), I would prefer to club these two groups together.
I thought of creating a separate Word2Vec or Glove embedding for each Group but then am unable to figure out how to find similarities between these vectors. Further, creating 45 different Word2Vec embeddings is pretty computationally painful. So I am a little stuck on this. Any ideas on how to approach this? Any help would be great
Thanks !