1

I'm trying to modify the code of cifar10.py in order to be able to feed images to the network.

I'm actually able to run the code and to start the training process but after some time, if I run tensorboard, under the section "images" I always have the same image. Moreover the crossentropy goes to zero. I think that I'm loading the images wrong.

Here's the code

   def distorted_inputs():
   #Reading the dirs file where all the directories of the images are stored
   filedirs = [line.rstrip('\n') for line in open('image_dirs.txt')]

   #create a list of files 
   filenames = []
   i = 0

   for f in filedirs:   
      png_files_path = glob.glob(os.path.join(f, '*.[pP][nN][gG]')) 
      print('found ' + str(len(png_files_path)) + ' files in ' + f)
      for filename in png_files_path:
         #storing file_name label
         s = filename + " " + str(i)
         filenames.append(s)
      i = i+1

   # Create a queue that produces the filenames to read and the labels
   filename_queue = tf.train.string_input_producer(filenames)

   my_img, label = read_my_file_format(filename_queue.dequeue())         
   label = tf.string_to_number(label, tf.int32)
   init_op = tf.initialize_all_variables()
   with tf.Session() as sess:
      sess.run(init_op)

      # Start populating the filename queue.
      coord = tf.train.Coordinator()
      threads = tf.train.start_queue_runners(coord=coord)

      image = my_img.eval()

      coord.request_stop()
      coord.join(threads)

   reshaped_image = tf.cast(image, tf.float32)

   resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,IMAGE_SIZE, IMAGE_SIZE)

   distorted_image = tf.image.random_crop(reshaped_image, [24, 24])

   # Randomly flip the image horizontally.
   distorted_image = tf.image.random_flip_left_right(distorted_image)

   # Because these operations are not commutative, consider randomizing
   # randomize the order their operation.
   distorted_image = tf.image.random_brightness(distorted_image,max_delta=63)
   distorted_image = tf.image.random_contrast(distorted_image,lower=0.2, upper=1.8)

   # Subtract off the mean and divide by the variance of the pixels.
   float_image = tf.image.per_image_whitening(distorted_image)

   # Ensure that the random shuffling has good mixing properties.
   min_fraction_of_examples_in_queue = 0.4
   min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *min_fraction_of_examples_in_queue)
   print ('Filling queue with ITSD images before starting to train. ''This will take a few minutes.')

   # Generate a batch of images and labels by building up a queue of examples.
   return _generate_image_and_label_batch(float_image, label, min_queue_examples)

The image reading part comes from https://github.com/HamedMP/ImageFlow The custom reader comes from Tensorflow read images with labels and the relative function is implemented as the following

 def read_my_file_format(filename_and_label_tensor):
  """Consumes a single filename and label as a ' '-delimited string.

  Args:
    filename_and_label_tensor: A scalar string tensor.

  Returns:
    Two tensors: the decoded image, and the string label.
  """
  filename, label = tf.decode_csv(filename_and_label_tensor, [[""], [""]], " ")

  file_contents = tf.read_file(filename)
  example = tf.image.decode_png(file_contents)
  return example, label

Thanks

Community
  • 1
  • 1
Dario
  • 331
  • 4
  • 16

1 Answers1

0

You can use this piece of code created by me, for my classification problem:

        resized_image = cv2.resize(image, (WIDTH, HEIGHT))
        label = np.uint8(nclass)

        arr = np.uint8([0 for x in range(image_bytes)])
        #  fill the label:
        arr[0] = label
        arr_cnt = 1

        #  fill the image (row-major order). first R values, then G values then B values
        for y in range(0, HEIGHT):
            for x in range(0, WIDTH):
                arr[arr_cnt] = np.uint8(resized_image[x, y, 2])  # R
                arr[arr_cnt + 1024] = np.uint8(resized_image[x, y, 1])  # G
                arr[arr_cnt + 2048] = np.uint8(resized_image[x, y, 0])  # B

                arr_cnt += 1

        print "train arr:", arr[0], arr[3072]
        train_arr = np.append(train_arr, arr)
        #print train_arr[file_in_dir*3073]
    else:
        invalids_cnt += 1
        #print "image", files_in_dir[file_in_dir], "is invalid"

    #  Write array to train.bin file:
with open('data_batch_%d.bin' % nclass, 'wb') as f:
        f.write(train_arr)
        f.close()

Here, the resized image is the resized version of one input image "image". Next, I create an array with 3073 bytes: 1st byte = label, next 1024 bytes = red values of the image, next 1024 bytes = green values of the image, next 1024 bytes = blue values of the image.

I do this for every single input image and then concatenate it into a big binary array, which is written in a binary file "data_batch_%d"

I have posted my complete script (maybe harder to understand for general purposes) in this gist: gist

Twimnox
  • 365
  • 2
  • 3
  • 14
  • Thanks for your answer. I was looking for a way to load images formats and not to convert them into binary and then feed the network with them. Nevertheless, for my project I ended up following the same solution as yours. – Dario Mar 09 '16 at 14:30
  • I believe that converting them to a binary format will increase the models performance, since it's a much more "raw" format. – Twimnox Mar 09 '16 at 14:43