2

I am working to setup data for an unsupervised learning algorithm. The goal of the project is to group (cluster) different customers together based on their behavior on the website. Obviously, some sort of clustering algorithm is best for discovering patterns in the data we can't see as humans.

However, the database contains multiple rows for each customer (in chronological order) for each action the customer took on the website for that visit. For example customer with ID# 123 clicked on page 1 at time X and that would be a row in the database, and then the same customer clicked another page at time Y. That would make another row in the database.

My question is what algorithm or approach would you use for clustering in this given scenario? K-means is really popular for this type of problem, but I don't know if it's possible to use in this situation because of the grouping. Is it somehow possible to do cluster analysis around one specific ID that includes multiple rows?

Any help/direction of unsupervised learning I should take is appreciated.

Marc Frankel
  • 83
  • 1
  • 10
  • 1
    Seems like you should create an embedding for each customer entry. One way of doing this could be treating them as sequence of events and use existing techniques in natural language processing. – xxbidiao May 16 '19 at 20:57
  • @xxbidiao Could you possibly explain a little more in depth how one might do this? I've come across libraries like Word2Vector that work for our environment (Python). And I understand your idea of converting the actions into like a "sentence" then can the be processed. I'm just confused on the last step how to take that and turn it into a number that can be used by like kmeans. Thanks – Marc Frankel May 17 '19 at 13:06

1 Answers1

0

In short,

  1. Learn a fixed-length embedding (representation) of each event;
  2. Learn a way to combine a sequence of such embeddings into a single representation for each event, then use your favorite unsupervised methods.

For (1), you can do it either manually or use an encoder/decoder; For (2), there is a range of things you can do, ranging from just simply averaging embeddings from each event, to training an encoder-decoder on reconstructing the original sequence of events and take the intermediate representation (that the decoder uses to reconstruct the original sequence).

A good read on this topic (though a bit old; you now also have the option of Transformer Network):

Representations for Language: From Word Embeddings to Sentence Meanings

xxbidiao
  • 834
  • 5
  • 14
  • 27