Problems using pybrain for classification

Question

I am using pybrain to classify some data and my input data is a ndarray with 73 features and the output should be 0 and 1. And I have the test data which only have the input of the features but not the output. So I want to use the neural network trainer to get the output of my classification which is 0 or 1. My code is like this:

ds = ClassificationDataSet(73, 1, nb_classes=2)
for i in range(len(new_train_X)):
    ds.addSample(new_train_X[i],y[i])
ds._convertToOneOfMany()
net = buildNetwork(73,2,2, outclass=SoftmaxLayer)
trainer = BackpropTrainer(net,dataset=ds,momentum=0.1,verbose=True,weightdecay=0.01)
trainer.trainOnDataset(ds)

Then I have the test data with is a nd-array without the output value:

test_result = net.activateOnDataset(test_data).argmax(axis=1)

But it could not return the desired output. The result output should be an array with 0 or 1 and the array should be the same length compared with the input data. Is there anything wrong for that? I checked the documentation and it seems that you could only use the train data and do the cross validation. The error is like this: AttributeError: 'numpy.ndarray' object has no attribute 'reset' Is there any problem for the format of my test-data?

score 0 · Answer 1 · answered Jul 09 '15 at 16:26

0

Try to remove the axis=1, first of all. Second, the output of a net with 2 output neurons will be a array of 2 float numbers which will represent the activation of output softmax neurons. argmax() will tell you which of neurons has a bigger activation value - which means the class number.

Finaly, ensure that 'activateOnDataset' is executed against another pybrain dataset, not against some numpy array or anything.

answered Jul 09 '15 at 16:26

Maksim Khaitovich

4,742
7
39
70

Thanks! That's what I'm asking. Do you mean that activationOnDataset should be execute on the dataset like "ds.addSample(new_train_X[i],y[i])" ? Because the test data does not have the y value and I could not just write the code as "ds.addSample(new_train_X[i])". – user3019893 Jul 09 '15 at 17:19
Also, if the prediction value is 0.37. should I classify it as 0 or 1? – user3019893 Jul 09 '15 at 17:20
Yes. Like ds.addSample. But I don't like to activate anything on dataset. I usually build a pandas dataframe for test set and then just loop over it and for each X in test set I do net.activate(). For your second question - prediction value cannot be '0.37' unless you use 1 output neuron. – Maksim Khaitovich Jul 10 '15 at 12:30

score 0 · Answer 2 · answered Oct 09 '18 at 14:50

The result output should be an array with 0 or 1

afaik for that you can simply remove argmax and then use some kind of threshold (such as 0.5) or rounding to get 0 and 1.

out = fnn.activateOnDataset(test_ds)
out_values = [1 if it[0] > 0.5 else 0 for it in out.tolist()]

(based on https://github.com/AlexP11223/SimplePyBrainNeuralNeutwork/blob/master/nn.py)

Problems using pybrain for classification

2 Answers2