A similar question has been asked here how to train multiple neural networks simultaneously, but the answers were specific to Caffe. Here is my specific question:
A friend of mine has designed an RNN for a certain problem using Theano and TensorFlow. It has 14 input nodes, 2 hidden layers with 7 nodes each, and finally an output node. We have around 30,000 such RNNs that need to be trained. I am a software engineer with very little exposure to Machine Learning. What I need to do is to speed up the training process of these RNNs.
Looking at the problem from a CS perspective, I don't think that anything can be done to speed up the training of a single RNN. Running such a small RNN on a GPU makes no sense. Instead, we can achieve speed up by batching the RNNs, say 1000 at a time, and sending them to the GPU. The nature of the problem is SIMD - each RNN is identical, but it has to train on a different data set.
Can someone please explain how this could be done using Theano or TensorFlow?
Here is the code for a single model:
import pandas as pd
df=pd.DataFrame(b,columns= ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T'])
ds=df.groupby(['A','Q','R']).apply(lambda h:h.sort('S')).values.tolist()
import math
stationary_id=0
sale_from_previous_day=[]
for i in xrange(0,len(ds)):
if ds[i][0]!= stationary_id:
stationary_id=ds[i][0]
sale_from_previous_day.append(0)
else:
if float(ds[i-1][19])==0:
sale_from_previous_day.append(0)
else:
sale_from_previous_day.append(math.log(1+float(ds[i-1][19]))/float(ds[i-1][19]))
import numpy as np
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
# create a placeholder for input layer
input_layer = tf.placeholder(tf.float32, [1, 14])
# no. of neurons & layers
num_hidden = 7
num_layers = 2
# Construct Multilayer RNN
network = rnn_cell.BasicRNNCell(num_hidden)
network1 = rnn_cell.MultiRNNCell([network] * num_layers)
# The hidden state as a Variable initialized to zeroes
state1 = tf.Variable(tf.zeros([1, network1.state_size]))
# Connect the input layer and initial hidden state to the rnn cell
output1, state_output1 = network1(input_layer, state1)
# update the state
update_op1 = state1.assign(state_output1)
#hidden to output weights
output_W1 = tf.Variable(tf.truncated_normal([7, 1]))
#keep an outbias as well
output_b1 = tf.Variable(tf.zeros([1]))
#the outclass linear layer returns predicted output
final_output = tf.matmul(output1, output_W1) + output_b1
#Input for correct output (for training)
correct_output = tf.placeholder(tf.float32, [1, 1])
##Calculate the Sum-of-Squares Error
error = tf.pow(tf.sub(final_output, correct_output), 2)
#Adam's
train_step = tf.train.AdamOptimizer(0.0006).minimize(error)
##session
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1,
intra_op_parallelism_threads=1))
#Initialize all Variables
sess.run(tf.initialize_all_variables())
for epoch in range(0,7):
er= 0
pon= 0
for i in range(len(ds)):
a,b=np.array([[ds[i][1],ds[i][2],ds[i][3],ds[i][4],ds[i][5],ds[i][6],ds[i][7],ds[i][8],ds[i][9],ds[i][10],ds[i][11],ds[i][12],ds[i][14],sale_from_previous_day[i]]]),np.array([[ds[i][19]]])
_, _, network_output = sess.run([update_op1,train_step,final_output],feed_dict = { input_layer: a,correct_output: b})
er+= 0.5*((b[0][0]) - (network_output[0][0]))**2
pon+= 1
print er/pon
print(int(round(time.time() * 1000))-m1)/1000.0