Classifier for diff reports

Question

I am new to ML. I have a diff report with annotations indicating good diff and bad diff. Example -

OLD STRING NEW STRING DIFF ANNOTATION

abc AbC good

pqr xyz bad

lmn wxy good

....

Given this training set, is it possible to use a classifier to predict annotations for future diff reports assuming that they have similar content. If so, which classifier is most suitable for this task ?

My flag: Off topic, too thoretical. Not programming / SW specific. www.cs.stackexchange.com if anywhere – 22 hours ago helpful — The Unfun Cat, Nov 15 '12 at 19:54

score 1 · Answer 1 · answered Nov 15 '12 at 12:44

1

There is no way of knowing which is the "best classifier" unless you try them and tweak its parameters. Weka can get you started if you are a beginner in this area.

answered Nov 15 '12 at 12:44

Patrick Koh

11
1

score 0 · Answer 2 · answered Nov 16 '12 at 15:10

Classifiers are not magic wands that can take in anything and make sense out of it. You need to break down your data into "features" or "signals" which the classifier can then detect a pattern in which it could use to automatically label data in the future. Given the example training set you have given us (that consists of 3 short lines), it is impossible for anybody to guess what recurring commonalities exist in the data that a classifier could leverage to be able to do its job.
It might be possible to automatically identify what annotation a line can get if you can think of some potential signals that a computer could study and then use to make an intelligent guess. The optimal choice of classifier depends mostly on what kind of signals you pick. If there are recurring words in each of the strings, then maybe Naive Bayes might do the trick, if the signals you come up with form a vector of numbers, then logistic regression or svm would be nice choices to play with.

Classifier for diff reports

2 Answers2