1

I am pretty new in ml so I am facing some difficulties realizing how could I use spark machine learning libraries with time series data that reflect to a sequence of events.

I have a table that contains this info:

StepN#, element_id, Session_id

Where step n# is the sequence in which each element appears, element_id is the element that has been clicked and session_id in which user session this happened.

It consists of multiple sessions and multiple element-sequence per session. i.e. one session will contain multiple lines of elements. Also each session would have the same starting and ending point.

My objective is to train a model that would use the element sequences observed to predict the next element that is most likely to be clicked. Meaning I need to predict the next event given the previous events.

(in other words I need to average users click behavior for a specific workflow so that the model will be able to predict the next most-relevant click based on the average)

From the papers and the examples I find online I understand that this makes sense when there is a single sequence of events that is meant to be used as an input for the training model.

In my case though, I have multiple sessions/instances of events (starting all at the same point) and I would like to train an averaging model. I find it a bit challenging though to understand how could that be approached using for example HMM in spark. Is there any practical example or tutorial that covers this case?

Thank you for spending the time to read my post. Any ideas would be appreciated!

Kratos
  • 1,064
  • 4
  • 20
  • 39

1 Answers1

1

This can also solve with frequent pattern mining. check this: https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html

In this situation, you can find frequent items that occurred frequently together. In the first step you teach the model what is frequent, Then for prediction step, the model can see some events and can predict the most common events to this event

Masoud
  • 1,343
  • 8
  • 25