Cross-validation in Pybrain

Question

I'm trying to figure out the right way to do 5-fold cross-validation in pybrain. I went through their documentation, but that didn't help. I found the following two versions of code online:

Found this one in a question here.

net = pybrain.tools.shortcuts.buildNetwork(5, 8, 1)
trainer = BackpropTrainer(net, ds)
evaluation = ModuleValidator.classificationPerformance(trainer.module, ds)
validator = CrossValidator(trainer=trainer, dataset=trainer.ds, n_folds=5, valfunc=evaluation)
print(validator.validate())

Error:
evaluation = ModuleValidator.classificationPerformance(trainer.module, ds)

File ".../pybrain/tools/validation.py", line 168, in classificationPerformancedataset)

File ".../pybrain/tools/validation.py", line 204, in validate return valfunc(output, target)

File ".../pybrain/tools/validation.py", line 33, in classificationPerformance return float(n_correct) / float(len(output))

TypeError: only length-1 arrays can be converted to Python scalars

And the second one here.

  modval = ModuleValidator()
  for i in range(1000):
      trainer.trainEpochs(1)
      trainer.trainOnDataset(dataset=trndata)
      cv = CrossValidator( trainer, trndata, n_folds=5, valfunc=modval.MSE )
      print "MSE %f @ %i" %( cv.validate(), i )

Error - trainer.train()

File ".../rprop.py", line 43, in train for seq in self.ds._provideSequences():

AttributeError: 'NoneType' object has no attribute '_provideSequences'

I went to the source code to try and trace the cause of the error but couldn't figure out what I need to change. Any help appreciated.

When I was running my code by simply dividing the dataset into 3 parts (training, validation and testing) it was working well. I have been getting these errors only when I tried to implement k-fold cross-validation.

[I think the problem is that they are using `sum()` instead of `np.sum()`](https://github.com/pybrain/pybrain/issues/182). — Stefan Falk, Nov 18 '15 at 11:16

score 1 · Answer 1 · answered Jan 30 '16 at 15:47

This seemed to work for me:

import numpy as np

from processdata import process_data
from pybrain.datasets import ClassificationDataSet
from pybrain.datasets import SupervisedDataSet
from pybrain.structure import FeedForwardNetwork
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
from pybrain.supervised.trainers import BackpropTrainer

n=FeedForwardNetwork()

#Define Layers
inLayer= LinearLayer(200)
hiddenLayer= SigmoidLayer(100)
outLayer = LinearLayer(1)

#Add layers to the neural net module
n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)

#Define Connections
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)

#add connections to the module
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
#make ready
n.sortModules()

#Define Trainer
trainer = BackpropTrainer( n, dataset=ds, momentum=0.1, verbose=True, weightdecay=0.005)

#perform crossvalidation
from pyBrain.tools.validation import CrossValidator
cv=CrossValidator(trainer=trainer, dataset=ds, n_folds=5) #creates a crossvalidator instance
CrossValidator.validate(cv) #calls the validate() function in CrossValidator to return results

It should output the error for each fold.

Cross-validation in Pybrain

1 Answers1