1

I am trying to optimize a pipeline and wanted to try giving RandomizedSearchCV a np.random.RandomState object. I can't it to work but I can give it other distributions.

Is there a special syntax I can use to give RandomSearchCV a np.random.RandomState(0).uniform(0.1,1.0)?

from scipy import stats
import numpy as np
from sklearn.neighbors import KernelDensity
from sklearn.grid_search import RandomizedSearchCV

# Generate data
x = np.random.normal(5,1,size=int(1e3))

# Make model
model = KernelDensity()

# Gridsearch for best params
# This one works
search_params = RandomizedSearchCV(model, param_distributions={"bandwidth":stats.uniform(0.1, 1)}, n_iter=30, n_jobs=2)
search_params.fit(x[:, None])

# RandomizedSearchCV(cv=None, error_score='raise',
#           estimator=KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True,
#        kernel='gaussian', leaf_size=40, metric='euclidean',
#        metric_params=None, rtol=0),
#           fit_params={}, iid=True, n_iter=30, n_jobs=2,
#           param_distributions={'bandwidth': <scipy.stats._distn_infrastructure.rv_frozen object at 0x106ab7da0>},
#           pre_dispatch='2*n_jobs', random_state=None, refit=True,
#           scoring=None, verbose=0)

# This one doesn't work :(
search_params = RandomizedSearchCV(model, param_distributions={"bandwidth":np.random.RandomState(0).uniform(0.1, 1)}, n_iter=30, n_jobs=2)
# TypeError: object of type 'float' has no len()
O.rka
  • 29,847
  • 68
  • 194
  • 309

1 Answers1

2

What you observe is expected, as the class-method uniform of an object of type np.random.RandomState() immediately draws a sample at the time of the call.

Compared to that, your usage of scipy's stats.uniform() creates a distribution yet to sample from. (Although i'm not sure if it's working as you expect in your case; be careful with the parameters).

If you want to incorporate something based on np.random.RandomState() you have to build your own class like mentioned in the docs:

This example uses the scipy.stats module, which contains many useful distributions for sampling parameters, such as expon, gamma, uniform or randint. In principle, any function can be passed that provides a rvs (random variate sample) method to sample a value. A call to the rvs function should provide independent random samples from possible parameter values on consecutive calls.

sascha
  • 32,238
  • 6
  • 68
  • 110
  • Thanks! Is there an error in `RandomizedSearchCV(model, param_distributions={"bandwidth":stats.uniform(0.1, 1)}, n_iter=30, n_jobs=2)`? I was basing it off of https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/ – O.rka Oct 24 '16 at 19:00
  • @O.rka The uniform-class does not implement constructor-arguments [as i see it](https://github.com/scipy/scipy/blob/v0.18.1/scipy/stats/_continuous_distns.py#L4883). The upper class inherited from only uses named-arguments like a and b for the range. So i fear, that what you are doing samples from the default-range of (0,1) but i'm not 100% sure about that. But that should be easy to check. – sascha Oct 24 '16 at 19:01
  • Does it do something like this? `stats.uniform(5,1).rvs(3).tolist() # [5.172340508345329, 5.137135749628878, 5.932595463037163]` or is it different in the backend? – O.rka Oct 24 '16 at 19:21
  • 1
    @O.rka I would expect it to just use the ```rvs```-method like you showed. And it indeed looks as the way you used it is okay (assuming: ```This distribution is constant between loc and loc + scale.``` is what you want) – sascha Oct 24 '16 at 19:42