1

I would like to run a naive implementation of grid search with MLlib but I am a bit confused about choosing the 'best' range of parameters. Apparently, I do not want to waste too much resources for a combination of parameters that will probably not give an improved model. Any suggestions from your experience?

set parameter ranges:

val intercept   : List[Boolean]  = List(false)
val classes     : List[Int]      = List(2)
val validate    : List[Boolean]  = List(true)
val tolerance   : List[Double]   = List(0.0000001 , 0.000001 , 0.00001 , 0.0001 , 0.001 , 0.01 , 0.1 , 1.0)
val gradient    : List[Gradient] = List(new LogisticGradient() , new LeastSquaresGradient() , new HingeGradient())
val corrections : List[Int]      = List(5 , 10 , 15)
val iters       : List[Int]      = List(1 , 10 , 100 , 1000 , 10000)
val regparam    : List[Double]   = List(0.0 , 0.0001 , 0.001 , 0.01 , 0.1 , 1.0 , 10.0 , 100.0)
val updater     : List[Updater]  = List(new SimpleUpdater() , new L1Updater() , new SquaredL2Updater())

perform grid search:

val combinations = for (a <- intercept;
                        b <- classes;
                        c <- validate;
                        d <- tolerance;
                        e <- gradient;
                        f <- corrections;
                        g <- iters;
                        h <- regparam;
                        i <- updater) yield (a,b,c,d,e,f,g,h,i)

for( ( interceptS , classesS , validateS , toleranceS , gradientS , correctionsS , itersS , regParamS , updaterS ) <- combinations.take(3) ) {

      val lr : LogisticRegressionWithLBFGS = new LogisticRegressionWithLBFGS().
            setIntercept(addIntercept=interceptS).
            setNumClasses(numClasses=classesS).
            setValidateData(validateData=validateS)

      lr.
            optimizer.
            setConvergenceTol(tolerance=toleranceS).
            setGradient(gradient=gradientS).
            setNumCorrections(corrections=correctionsS).
            setNumIterations(iters=itersS).
            setRegParam(regParam=regParamS).
            setUpdater(updater=updaterS)

}
user706838
  • 5,132
  • 14
  • 54
  • 78

1 Answers1

0

Try randomized Grid search using randomizedsearchcv with a range for the order of magnitude for the hyperparams involved.

  • As it seems you're the first to answer this question at all (after 1.5 years time)... its good enough for me you post this as an answer; regardless of lacking rep. for normal commenting. This answer should normally be done as comment as it isn't involving lines of code. Keep that in mind. Enjoy SO ;-) – ZF007 Mar 06 '18 at 19:55