2

I recently came to know about this tool called word2vec. For my current work, I need to find out users that are similar to a given user. A single user has entities associated with it like age, qualifications, insitute/organisaions, languages known and list of certains tags. If we consider a each of these entities/columns together as random chunk of words for a user, can we correspondingly calculate the vector value for that user and use these values to deduce similarities between users? Would a wiki training vector help us get meaningful results?Any other way to do it?

labyrinth
  • 1,104
  • 3
  • 11
  • 32

1 Answers1

4

What you need is a simple unsupervised (or semi-supervised) clustering algorithm. word2vec with its pre-trained vectors may not be very helpful because institutions, etc. are unlikely to be in it.

Also, it seems that the number of "aspects" a user has it small, so you can simply have a clustering algorithm on vector representations where each dimension of your vector space is one of these aspects (age, qualification, organization, etc.).

A continuous space model like word2vec can be helpful if you want the similarity of users to reflect the similarity of these aspects (as opposed to exact equality).

If, for example, you want the qualification "Python expert" to be measured as something close to "Scripting expert", then go for word2vec. But if you are looking for exact matches among a finite predefined number of aspects, go for a simple clustering algorithm.

P.S. More detailed Q&A on this topic should be on Cross Validated.

Community
  • 1
  • 1
Chthonic Project
  • 8,216
  • 1
  • 43
  • 92