1

I'm working on a LogisticRegression model and trying to debug.

It's a simple thing but can't seem to get it to work: just have time of day and a state 0 or 1, and want to predict the state for a given time of day.

There are no errors when training the model, but I see this: GradientDescent: GradientDescent.runMiniBatchSGD finished. Last 10 stochastic losses NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN

in the logs when trying to eval I always end up with an error java.util.NoSuchElementException: key not found: keyname

I have seen this before when feeding the classifier a feature set that is not possible, but here I am only using 1 feature and it's a simple model, so I don't understand what is wrong.

Any idea how I can see what is going on?

I also used BinaryClassificationMetrics and it returns

FmeasureCurve = (NaN,0.17630133869823753)

ROCCurve =

(0.0,0.0) (1.0,1.0) (1.0,1.0)

How would I print the model information to see what values are in there? I there an easy way to get this data?

When I print the model I only get: org.apache.spark.mllib.classification.LogisticRegressionModel: intercept = 0.0, numFeatures = 1, numClasses = 2, threshold = None

Thanks

zero323
  • 322,348
  • 103
  • 959
  • 935
MrE
  • 19,584
  • 12
  • 87
  • 105

1 Answers1

0

Not sure what's going on. A few ideas: (1) Copy your data set into your question. (2) Make sure that 0 and 1 cases are interspersed in your data (i.e. there is no hard cut-off in your input space). That ensures that the parameters are some finite values. (3) Call clearThreshold on your model, then predict will yield raw (probabilistic) outputs. (4) There is a way to get the fitted parameters, I forget how. My only advice on this point is to browse the code and try to see how to return the parameters.

Robert Dodier
  • 16,905
  • 2
  • 31
  • 48
  • I guess the problems was that there was not just one fitted peak in the data but multiple and that didn't work well. I ended up using a different algorithm. – MrE Mar 09 '16 at 01:44