PyBrain's ClassificationDataSet is creating extra samples

Question

I'm encountering very strange behaviour using the ClassificationDataSet class of the PyBrain library. It seems to add extra samples to the data, and I cannot understand why!

Here is the code:

data = [[2, 4, 1],
        [3, 3, 0],
        [1, 2, 1]]
targets = [3, 1, 2]

training_ds = ClassificationDataSet(len(data[0]), nb_classes=3, class_labels=['1', '2', '3'])
for i in range(len(data)):
    training_ds.addSample(data[i], targets[i])

On the first call to addSample it adds an extra sample to the input data containing all 0s, and adds a target value of 0. On the second iteration the data is then the correct size, and on the third iteration it duplicates the data in a seemingly random order making it a (6,3) dataset. Does anyone know why it is doing this?

I'm using the latest version of PyBrain.

score 0 · Answer 1 · answered Oct 06 '15 at 11:50

I had simular problem.
My problem was in using ds.data["input"] instead of ds.getField("input").
The next code shows my problem

ds.AddSample([2, 4, 1], 1)
print(ds.data["input"])
>>> [[2, 4, 1], [0, 0, 0]]

And this is the right code

ds.AddSample([2, 4, 1], 1)
print(ds.getField("input"))
>>> [[2, 4, 1]]

Maybe this will help.

PyBrain's ClassificationDataSet is creating extra samples

1 Answers1