2

I have a basic, working neural network implementation in PyBrain

# relevant imports go here

train_input = numpy.loadtxt('train_input.csv', delimiter=',') 
test_input = numpy.loadtxt('test_input.csv', delimiter=',') 
train_output = numpy.loadtxt('train_output.csv', delimiter=',') 
test_output = numpy.loadtxt('test_output.csv', delimiter=',')

train_input = train_input / train_input.max(axis=0)
test_input = test_input / test_input.max(axis=0)
train_output = train_output / train_output.max(axis=0)
test_output = test_output / test_output.max(axis=0)
ds = SupervisedDataSet(2, 1)

for x in range(0, len(train_input) - 1):
    ds.addSample(train_input[x], train_output[x])


fnn = buildNetwork( ds.indim, 25, ds.outdim, bias=True)
trainer = BackpropTrainer(fnn, ds, learningrate=0.01, momentum=0.1)

for epoch in range(0, 100000): 
    if epoch % 10000 == 0:
        error = trainer.train()  
        print 'Epoch: ', epoch
        print 'Error: ', error

result = numpy.array([fnn.activate(x) for x in test_input])

I can run this by submitting it to Apache Spark and it works. Without changing the code, however, I assume I gain nothing from Spark.

EDIT

I noticed someone voted to close this so perhaps I'm being too vague. To rephrase my questions

  • If I run this code as a spark job, without customising it in any way, will it run just the same as if I ran it as a standard python script
  • To rewrite it to be best exploited by Spark should my key focus be on moving datasets from array to Spark RDDs
  • The for loop which actually trains the network, how would I change that to run in parallel via Spark
Philip O'Brien
  • 4,146
  • 10
  • 46
  • 96

0 Answers0