Customising PyBrain code to run as a Spark job

Question

I have a basic, working neural network implementation in PyBrain

# relevant imports go here

train_input = numpy.loadtxt('train_input.csv', delimiter=',') 
test_input = numpy.loadtxt('test_input.csv', delimiter=',') 
train_output = numpy.loadtxt('train_output.csv', delimiter=',') 
test_output = numpy.loadtxt('test_output.csv', delimiter=',')

train_input = train_input / train_input.max(axis=0)
test_input = test_input / test_input.max(axis=0)
train_output = train_output / train_output.max(axis=0)
test_output = test_output / test_output.max(axis=0)
ds = SupervisedDataSet(2, 1)

for x in range(0, len(train_input) - 1):
    ds.addSample(train_input[x], train_output[x])


fnn = buildNetwork( ds.indim, 25, ds.outdim, bias=True)
trainer = BackpropTrainer(fnn, ds, learningrate=0.01, momentum=0.1)

for epoch in range(0, 100000): 
    if epoch % 10000 == 0:
        error = trainer.train()  
        print 'Epoch: ', epoch
        print 'Error: ', error

result = numpy.array([fnn.activate(x) for x in test_input])

I can run this by submitting it to Apache Spark and it works. Without changing the code, however, I assume I gain nothing from Spark.

EDIT

I noticed someone voted to close this so perhaps I'm being too vague. To rephrase my questions

If I run this code as a spark job, without customising it in any way, will it run just the same as if I ran it as a standard python script
To rewrite it to be best exploited by Spark should my key focus be on moving datasets from array to Spark RDDs
The for loop which actually trains the network, how would I change that to run in parallel via Spark

Customising PyBrain code to run as a Spark job

EDIT

0 Answers0