How to implement next_batch() function for custom data in python

Question

I am currently working on the cats vs dogs classification task on kaggle by implementing a deep convNet. The following lines of code is used for data preprocessing:

def label_img(img):
   word_label = img.split('.')[-3]
   if word_label == 'cat': return [1,0]
   elif word_label == 'dog': return [0,1]

def create_train_data():
   training_data = []
   for img in tqdm(os.listdir(TRAIN_DIR)):
      label = label_img(img)
      path = os.path.join(TRAIN_DIR,img)
      img = cv2.resize(cv2.imread(path,cv2.IMREAD_GRAYSCALE),IMG_SIZE,IMG_SIZE))
      training_data.append([np.array(img),np.array(label)])

   shuffle(training_data)
   return training_data

train_data = create_train_data()

X_train = np.array([i[0] for i in train_data]).reshape(-1, IMG_SIZE,IMG_SIZE,1)
Y_train =np.asarray([i[1] for i in train_data])

I want to implement a function that replicates the following function provided in the tensorflow deep MNIST tutorial

batch = mnist.train.next_batch(100)

score 3 · Accepted Answer · answered Jun 16 '17 at 05:19

3

Apart from generating a batch, you may also want to randomly re-arrange data for each batch.

EPOCH = 100
BATCH_SIZE = 128
TRAIN_DATASIZE,_,_,_ = X_train.shape
PERIOD = TRAIN_DATASIZE/BATCH_SIZE #Number of iterations for each epoch

for e in range(EPOCH):
    idxs = numpy.random.permutation(TRAIN_DATASIZE) #shuffled ordering
    X_random = X_train[idxs]
    Y_random = Y_train[idxs]
    for i in range(PERIOD):
        batch_X = X_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        batch_Y = Y_random[i * BATCH_SIZE:(i+1) * BATCH_SIZE]
        sess.run(train,feed_dict = {X: batch_X, Y:batch_Y})

answered Jun 16 '17 at 05:19

Joshua Lim

315
3
9

Thank you so much. Finally I can train my network properly. – Kaustabh Kakoty Jun 16 '17 at 11:35
Can you enlighten me on what the next_batch() of tensorflow returns? Is it a random collection of data from the training set of the specified batch size? If so then does it ensure non repetition? @Joshua Lim – Kaustabh Kakoty Jun 16 '17 at 12:28
next_batch() is a function specifically for the MNIST tutorial provided by tensorflow. How it works is it randomizes the training image and label pairs at the begining, and selects each subsequent 100 images each time the function is called. Once it reaches the end, the image-label pairs are randomized again, and the process is repeated. The entire dataset is only reshuffled and repeated once all the available pairs are used. – Joshua Lim Jun 19 '17 at 05:50

score 0 · Answer 2 · answered Jun 15 '17 at 23:24

This code is a good example to come up with the function to generate batch.

To explain briefly, you just need to come up with two arrays for x_train and y_train like:

  batch_inputs = np.ndarray(shape=(batch_size), dtype=np.int32)
  batch_labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)

And set train data like:

  batch_inpouts[i] = ...
  batch_labels[i, 0] = ...

Finally pass the data set to session:

_, loss_val = session.run([optimizer, loss], feed_dict={train_inputs: batch_inputs, train_labels:batch_labels})

Will try this out. Thanks for your time. – Kaustabh Kakoty Jun 16 '17 at 11:36 — Kaustabh Kakoty, Jun 16 '17 at 11:36

How to implement next_batch() function for custom data in python

2 Answers2