8

I am trying to use PyBrain for some simple NN training. What I don't know how to do is to load the training data from a file. It is not explained in their website anywhere. I don't care about the format because I can build it now, but I need to do it in a file instead of adding row by row manually, because I will have several hundreds of rows.

Dr Sokoban
  • 1,638
  • 4
  • 20
  • 34
  • 1
    Several hundred rows means you have a very small set and shouldn't be concerned about performance. But doesn't PyBrain just accept NumPy arrays? – Fred Foo Nov 15 '11 at 16:46
  • I dont know, I am just starting to use it, but nowhere they say how to use NumPy arrays with their NN :/ – Dr Sokoban Nov 15 '11 at 17:03

2 Answers2

21

Here is how I did it:

ds = SupervisedDataSet(6,3)

tf = open('mycsvfile.csv','r')

for line in tf.readlines():
    data = [float(x) for x in line.strip().split(',') if x != '']
    indata =  tuple(data[:6])
    outdata = tuple(data[6:])
    ds.addSample(indata,outdata)

n = buildNetwork(ds.indim,8,8,ds.outdim,recurrent=True)
t = BackpropTrainer(n,learningrate=0.01,momentum=0.5,verbose=True)
t.trainOnDataset(ds,1000)
t.testOnData(verbose=True)

In this case the neural network has 6 inputs and 3 outputs. The csv file has 9 values on each line separated by a comma. The first 6 values are input values and the last three are outputs.

c0m4
  • 4,343
  • 10
  • 35
  • 40
  • that is great, thank you very much. Do you know how can I access the weight values for every neuron? – Dr Sokoban Nov 16 '11 at 09:15
  • 1
    You can access the individual layers like this: n['in'] for the input layer and n['out'] for output or n['hidden0'] for the first hidden layer. I dont know, but I would guess that you can then access the nodes of the layer in som way. dir(n['in']) should give you a hint of what you can do – c0m4 Nov 16 '11 at 10:32
  • I cannot find how to do it. I will make a new question. Thank you for your help. – Dr Sokoban Nov 16 '11 at 11:09
  • @DrSokoban check my answer. Cannot be more easier than that. –  Jul 05 '20 at 19:33
2

You just use a pandas DataFrame this way

import pandas as pd

dataset = SupervisedDataSet(6,3)

df = pd.read_csv('mycsvfile.csv')

dataset.setField('input', df.values[:,:6]) # this sets the features

y=[[x] for x in df.values[:,:6])] # Do this to avoid IndexError: tuple index out of range
                                  # as the target field should be a list of lists, 
                                  # even if its shape is 1

dataset.setField('target', y)     # this set the target[s] field[s]
del df,y

and you are good to go.