Anomaly detection using Markov chains

Question

I'm trying to detect anomalies using Markov Chains. I have a training dataset with a sequence of events that I used to create a probability transition matrix. Then, I create another matrix using a test dataset. I'm looking for a way to compare these to matrices in order to spot anomalies. Example: let's say event A to event C happens 0 times in the training data, and therefore its probability in the matrix is 0. If this transition from event A to event C does happen in the test dataset, it will have a probability larger than 0. This is something I'd like to detect.

I tried just substracting the 2 matrices and then reporting everything that's larger than 0, but this is not great: a probability of 0 in training and 0.1 in test is more relevant (and anomalous) than a probability of 0.7 in training and 0.6 in test. Just substracting them does not showcase it this way. Moreover, this way it sees a difference of 0.5 vs 0.7 more anomalous than 0.0 vs 0.1.

Also, a probability of 0.3 in training and 0.6 in test is more important (because doubled) than 0.7 in training and 1 in test (because maybe the other events just did not happen in the test set, which is fine). By the way, I use pandas crosstab and series to calculate the transition matrix.

have you seen the Dirichlet distribution? if I understand the question it's probably a good place to start — Sam Mason, Jan 27 '20 at 12:29
Hmm I'm not sure what you mean. Dirichlet to report differences between the two probability matrices, or instead of Markov? — user7312969, Jan 27 '20 at 12:59
sorry, that wasn't clear at all! it sounds like you're turning counts of events into transition probabilities. the problem with this is that if you've only got a "few" events then there's going to be a lot of variance in these estimates. the Dirichlet distribution is a nice way of handling this uncertainty, e.g. if you were only in a state once then you're not going to be very certain of the outgoing transitions, while if you've been in it a million times you've got a good chance of saying whether transitions are anomalous — Sam Mason, Jan 28 '20 at 16:05

Anomaly detection using Markov chains

0 Answers0