Scikits.learn 0.8 predict_proba() only ever outputs uniform probabilities

Question

I am working with the nice Python scikits.learn package to train some classifiers for part-based face recognition with the Histogram of Oriented Gradient feature. I've successfully trained a linear SVM to recognize a particular face part, but I am having strange issues with the predict_proba() function.

I use the following training code:

 import numpy as np
 from scikits.learn import svm

 # Do some stuff to prepare DATA matrix of feature vector samples
 # And LABELS vector of 1's and 0's for positive and negative samples.
 clf = svm.SVC(kernel='linear',probability=True)
 clf.fit(DATA, LABELS)

But then when I run predict_proba([test_vector]), I only ever see [[ 0.5 0.5 ]] as the output, i.e. the uniform probability between my two binary classes.

Oddly, though, when I just use the predict() function, it performs fairly well and certainly cannot be just trivially assigning everything uniform probabilities. On a test image, I get much more dense '1' classifications around the correct face part, and then some expected noisy '1' classifications elsewhere around the scene, but predominantly all '0' classifications, as expected.

What could be causing this malfunction with predict_proba()?

I'm not able to reproduce this error using the `0.8` branch of the scikit-learn Git repo. Have you considered upgrading to 0.9? If that doesn't fix the error, you might want to [file a bug report](https://github.com/scikit-learn/scikit-learn/issues/new). — Fred Foo, Nov 24 '11 at 14:17
Otherwise, post a minimal dataset here that will trigger the bug. — Fred Foo, Nov 24 '11 at 14:25
Is it possible that there is some size limitation on the number of feature vectors? When I fit a classifier with only the first 5 feature vectors and only the first 5 labels, then `predict_proba()` seems to work fine. But when I repeat the same code but leave in all 7206 features and 7206 labels, then the output classifier only ever gives `[0.5 0.5]` as the probability vector. — ely, Nov 24 '11 at 17:14
What does the `cache_size` parameter of the `SVC()` function control? — ely, Nov 24 '11 at 17:15
Also, I have switched to an RBF kernel and the problem seems to go away. But this itself is puzzling. If the training data is not well separated linearly, does scikits handle this in any special way? — ely, Nov 24 '11 at 17:20
7205 features and 7206 labels? I presume you mean *samples* and labels? But `SVC` should really be able to handle such an amount of data without problems. The `cache_size` parameters controls an internal cache. It's given in megabytes (and is only documented in recent Git versions). — Fred Foo, Nov 24 '11 at 17:58
After more inspection, it appears that the `predict_proba()` probabilities are generated with cross-validation. Could this have anything to do with it? What would be different between RBF and linear kernels? — ely, Nov 25 '11 at 03:50

Scikits.learn 0.8 predict_proba() only ever outputs uniform probabilities

0 Answers0