Basic linear regression: training criterion is NaN

Question

As a CNTK learning exercise, I figured I would modify the Logistic Regression example from lr_bs.cntk, and try to get a basic linear regression working.

Instead of this in the logistic example:

# parameters to learn
b = Parameter (LDim, 1)     # bias
w = Parameter (LDim, SDim)  # weights

# operations
p = Sigmoid (w * features + b)    

lr = Logistic (labels, p)
err = SquareError (labels, p)

# root nodes
featureNodes    = (features)
labelNodes      = (labels)
criterionNodes  = (lr)
evaluationNodes = (err)
outputNodes     = (p)

... I simply changed the code to this:

# operations
p = (w * features + b)

lr = SquareError (labels, p)
err = SquareError (labels, p)

I got this to work on a synthetic dataset I created. However, I tried then to run it on files I created off the Wine Quality dataset. I can't get it to work, and I am at a loss on how to move forward.

The Train command fails, with the following diagnosis:

EXCEPTION occurred: The training criterion is not a number (NAN).

I interpret this to mean that lr is not producing a valid number. I just don't understand how SquareError could fail, and how to approach fixing the issue.

For information, here is how the dataset, after preparation, looks like:

|features 7.400 0.700 0.000 1.900 |labels 5.000
|features 7.800 0.880 0.000 2.600 |labels 5.000
|features 7.800 0.760 0.040 2.300 |labels 5.000
|features 11.200 0.280 0.560 1.900 |labels 6.000
|features 7.400 0.700 0.000 1.900 |labels 5.000

I cannot see any blatantly problematic data problem. I use the CNTKTextFormatReader to read the data, perhaps the problem is with the data reading part, but without debugging I can't be sure.

Any advice on how to approach this would be really appreciated.

score 3 · Accepted Answer · answered Apr 10 '17 at 20:21

3

I had a very similar idea for getting started, except that I modified the Python tutorial for logistic regression in order to create a linear regression example.

I found that the learning rate specified in the logistic example is far too big to use with the squared error loss function required for linear regression purposes. So as a first suggestion, I would suggest you try decreasing learningRatesPerSample to something like 0.001 or smaller.

I did a quick google search of the error code you are saw and that returned this issue, which also suggests learning rate might be your culprit.

If you are interested I wrote a blog post about my linear regression example in Python.

answered Apr 10 '17 at 20:21

Ian Ash

1,087
11
23

Thank you - should have thought about that. I didn't get how the square error could produce "not a number", but a poorly chosen learning rate could indeed cause issues :) Reducing the rate did fix the issue. – Mathias Apr 11 '17 at 21:19
I got NaN after 22 epochs, which was actually caused by the error growing each epoch from about 0.12 (epoch #1), to Infinity (epoch #11) to NaN (epoch #22). I suppose the large learning rate (0.04) made the adjustments overshoot, causing higher errors each epoch. Changing learning rate to 0.001 fixed it. – Thomas Hilbert Mar 19 '18 at 23:41

Basic linear regression: training criterion is NaN

1 Answers1