Python unsupervised learning for predicting sequence in set of new data

Question

I am using python to model data.Is there anyone to help me to get an idea about to choosing right algorithm for a case.

Let's say, If I want to predict the sequence of the operations with mix set of operations. And there are different combinations of possible sequences in the past, they are different because because there is other outlier effects that change this sequence.

For every process(there are more than hundred process) I have almost 30-40 operation given me as a mix set. And for all process I have past data with these operations and their sequence number. The point that I stack, operations does not have permanent sequence, it is change by "process specific operation set".

example data:

process 1: past data1: [123, 245, 65, 900 ,78, 456 , ..., 45,893]

process 1: past data2: [123, 65, 900 , 245 , 456, 78, ..., 45,893]

....... [p.s. numbers in the arrays are the operation codes]

I think, I need to use unsupervised learning for the predict to sequence of new operation set. But I cannot figure out which algorithm and what kind of road map should I follow.

Any help?

score 1 · Answer 1 · answered Nov 18 '17 at 21:13

First of all, your question is poorly worded and it is not clear what it is that you want to predict: Do you want to predict the rest of a sequence given its beginning? Or, given a bunch of a sequences, do you want to generate a sequence from scratch that is most likely?

In any case, since you have examples of actual data sequences, they can be used as training samples, and thus you would be benefited more by supervised learning methods, as opposed to unsupervised learning methods. You can try to model your problem as such: Given a sequence of inputs X₁, X₂, ..., X_n, what is the most likely following element, X_n+1?

This is an interesting, and challenging problem. Off the top my of head, it seems similar to word prediction in a paragraph, a problem tackled by word2Vec. However, I'm assuming the range of possible values for any given variable are smaller, so you might find this to be interesting - it is an introduction to sequence prediction with RNNs.

Let me explain in different way. There are bunch of operation codes for every process. Lets take one process as an example. There are different combinations of operation codes for this process. And they are in a sequence in separate arrays, every array shows sequence of operation codes for same process in different time. Like array1:[234,345,...34], array2: [23,345,....,34] etc. These arrays very good to use training set. And the one that I want to predict is in a new array but in a mix sequence, I want to predict to operation codes sequence according to past data. — dss, Nov 18 '17 at 21:55

Python unsupervised learning for predicting sequence in set of new data

1 Answers1