Self learning data evaluation in Python

Question

I have about 300.000 images - all categorized manually as "clip art" or "photo". For each image I can calculate three independent, numerical features that give a clear hint about whether the image is indeed a clip art or a photo. None of these numbers alone is enough for auto-categorizing new images reliably. Used in combination, however, auto-categorizing should be pretty accurate.

I can manually fiddle around and test hundreds of images and observe data. Thus, I can empirically find more or less suitable weighing factors or something alike. But I do have 300.000 properly categorized data sets ... I should be able to use this data to categorize new images pretty reliably. But how? I don't even know the proper terms to Google for an answer: is it "self learning" or a "neural network" or "artificial intelligence" that I'm looking for? How do I start in Python to solve this?

You are looking for machine learning (supervised learning since you have the category). There is a lot of techniques though svm seems to work most of the time. — fredtantini, Dec 31 '14 at 09:18
I think that the downwote is because the question is [off-topic](http://stackoverflow.com/help/on-topic) — fredtantini, Dec 31 '14 at 09:19

score 3 · Accepted Answer · answered Dec 31 '14 at 09:17

3

Your task is called classification and is a part of machine learning. This seems to be a very brief introduction to this field.

I don't know of a handy python library (I'm not saying there are none, but I don't use such things so I don't know of any), but some ML algorithms and classification models are very easy to implement yourself (e.g. k-NN, or linear classifier/regressor).

answered Dec 31 '14 at 09:17

zegkljan

8,051
5
34
49

Thanks, even the term "machine learning" helps. I'm currently studying your recommended read. – Simon Steinberger Dec 31 '14 at 09:18
@SimonSteinberger Scikit maybe a good library to look at for your task -http://scikit-learn.org/stable/ – New2WebDevelopment Dec 31 '14 at 10:32
@New2WebDevelopment: Already on it :-) Seems to be perfect for the task. Thanks! – Simon Steinberger Dec 31 '14 at 10:36

Self learning data evaluation in Python

1 Answers1