-2

I have about 300.000 images - all categorized manually as "clip art" or "photo". For each image I can calculate three independent, numerical features that give a clear hint about whether the image is indeed a clip art or a photo. None of these numbers alone is enough for auto-categorizing new images reliably. Used in combination, however, auto-categorizing should be pretty accurate.

I can manually fiddle around and test hundreds of images and observe data. Thus, I can empirically find more or less suitable weighing factors or something alike. But I do have 300.000 properly categorized data sets ... I should be able to use this data to categorize new images pretty reliably. But how? I don't even know the proper terms to Google for an answer: is it "self learning" or a "neural network" or "artificial intelligence" that I'm looking for? How do I start in Python to solve this?

Simon Steinberger
  • 6,605
  • 5
  • 55
  • 97
  • 1
    You are looking for machine learning (supervised learning since you have the category). There is a lot of techniques though svm seems to work most of the time. – fredtantini Dec 31 '14 at 09:18
  • I think that the downwote is because the question is [off-topic](http://stackoverflow.com/help/on-topic) – fredtantini Dec 31 '14 at 09:19

1 Answers1

3

Your task is called classification and is a part of machine learning. This seems to be a very brief introduction to this field.

I don't know of a handy python library (I'm not saying there are none, but I don't use such things so I don't know of any), but some ML algorithms and classification models are very easy to implement yourself (e.g. k-NN, or linear classifier/regressor).

zegkljan
  • 8,051
  • 5
  • 34
  • 49