0

I'm using scikit-slearn 0.14 and trying to implement a user defined scoring function for GridSearchCV to evaluate.

def someScore(gtruth, pred):
    pred = np.clip(pred, 0, np.inf)
    logdif = np.log(1 + gtruth) - np.log(1 + pred)
    return sin(np.sqrt(np.mean(np.square(logdif))))

neg_scorefun = make_scorer(lambda x,y:someScore(x,y))

Note: I appended sinus so the score ranks differently than most other scores, just for testing...

Now, if I run

g1=GridSearchCV(KernelDensity(),param_grid,scoring=neg_scorefun).fit(X)

and then

g2=GridSearchCV(KernelDensity(),param_grid).fit(X)

then

print g1.best_score_, g2.best_score_

give me exactly the same result. Also "best_params_" are identical. No matter what I put into the function "someScore", it's always the same. I'd expect the results to differ, considering that I tested "someScore" with fixed values, returned some negative values, used sinus (as in the example above) tried basically all sorts of values derived from "ground truth" and "prediction"...

From the results it appears, that no matter what scoring I use, the scorer is ignored, overwritten, not called, whatever...

What am I missing? Any suggestions?

  • Can you try with the current version, 0.16.0? – Andreas Mueller Apr 06 '15 at 18:41
  • Tried updating, the results are the same. I tried to do a g1.fit(X,vectorFilledWithOnes). Then I got another error message, namely, that "KernelDensity does not have attribute 'predict'". Is it possible, that using a scoring function does not make sense for KD? At least it seems to me that it's not intended? Thing is, I wanted to use a scorer to me able to use a scoring mechanism designed by me to measure the goodness of the density estimation, but now it appears to me that I will need to choose another path. Am I correct giving up trying to use GridSearchCV with KernelDensity? –  Apr 07 '15 at 07:34
  • Wait, I am a bit confused about what you are trying to do. What is your ground truth? KD is an unsupervised algorithm, and you can use scoring functions, but only unsupervised ones. What is it that you are trying to measure? – Andreas Mueller Apr 07 '15 at 14:41
  • Basically I want to do hotspot detection. Given a set of points on a 2D plane (x/y values), I want to estimate a density and using that density I want to derive hotspot locations. The hotspots need to fullfill several criteria, which should be considered by the scoring function. The scoring function I provided in the question is just for testing, as I'm new to python, I wanted to play around to understand how things work. With the help of the scoring function and GridSearchCV, I wanted to select parameters for KD, so that my constraints for the hotspots are met. –  Apr 08 '15 at 19:20
  • That is totally possible. Create your own scoring object like described here: http://scikit-learn.org/dev/modules/model_evaluation.html#implementing-your-own-scoring-object That will allow you to use any properties of the KD object. – Andreas Mueller Apr 08 '15 at 20:05
  • I knew that document, but thanks anyways. I simply didn't understand how exactly the custom scoring object is to be created. But now, I'm on a good path, and will post my solution once I'm done. Thanks for the posts anyway. –  Apr 11 '15 at 18:43
  • What part did you not understand? – Andreas Mueller Apr 13 '15 at 16:33
  • Well for starters how to create an object. As I mentioned, I'm new to python and to OOP, so I had no clue what to do first. Would have made my life easier, if the doc just stated I'd need a class with __init__ and __call__ and some more... –  Apr 20 '15 at 05:29
  • You could have also defined a function. – Andreas Mueller Apr 20 '15 at 20:22
  • That's what I tried first as you can see above. If a function would have been sufficient, can you provide an example how the construct would have worked? I got a solution by now which appears suitable for my problem, but I'd be interested in different approaches too. –  Apr 22 '15 at 11:47

1 Answers1

0

If you want to use a custom function, you need to define a callable with the following signature:

def myscoring(estimator, X, y): return np.mean(estimator.predict(X) != y)

Andreas Mueller
  • 27,470
  • 8
  • 62
  • 74