Let me begin by saying that my knowledge about machine learning is very, very limited. But I guess I could use my situation to learn it.
My problem domain evolves 18 well-known patterns. Those patterns are assigned to users once they get created in the system, by the order they come in.
The focus now is to import user data from a different system and the pattern information is not included in it. The patterns exist to make sure every user gets a job schedule. For the users being imported, I'll have to figure what their pattern is by observing their schedule. It's important to note that it's very common for their current schedule to not meet any known pattern entirely, so what I have to do is to find the most likely known pattern.
Reading through Accord's classification classifiers I thought Sequence classification could be a good fit for the problem so I tried to use it, as follows:
class Program
{
static void Main(string[] args)
{
int[][] inputs =
{
new[] {1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1}, //pattern 1
new[] {1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1}, //pattern 2
new[] {1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3}, //pattern 3
new[] {3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3}, //pattern 4
new[] {3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3}, //pattern 5
new[] {3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3}, //pattern 6
new[] {3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3}, //pattern 7
new[] {3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1}, //pattern 8
new[] {1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1}, //pattern 9
new[] {1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1}, //pattern 10
new[] {1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1}, //pattern 11
new[] {1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 2}, //pattern 12
new[] {2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2}, //pattern 13
new[] {2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2}, //pattern 14
new[] {2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2}, //pattern 15
new[] {2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2}, //pattern 16
new[] {2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1}, //pattern 17
new[] {1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1} //pattern 18
};
int[] outputs =
{
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17
};
int[][] actualData =
{
new[] {3,3,1,1,1,1,2,2,2,2,2,1,1,1,1,3,3,3} // should be pattern 5
};
// Create the Hidden Conditional Random Field using a set of discrete features
var function = new MarkovDiscreteFunction(states: 3, symbols: 3, outputClasses: 18);
var classifier = new HiddenConditionalRandomField<int>(function);
// Create a learning algorithm
var teacher = new HiddenResilientGradientLearning<int>(classifier)
{
MaxIterations = 1000
};
// Run the algorithm and learn the models
teacher.Learn(inputs, outputs);
// Compute the classifier answers for the given inputs
int[] answers = classifier.Decide(actualData);
foreach (var answer in answers)
{
Console.WriteLine(answer);
}
}
}
I expected the output to be pattern 5 as they are an exact match, but that was not the case. I tried to train the model with more inputs by repeating the patterns an associating the input to the correct pattern. The actual data consist in more than 18 values. But it didn't help it match it, actually made it "worse".
In my ideal situation, the program would be able to always guess the known patterns right and find best candidates in data that don't match them. Have I chosen the wrong path here?