0

I am trying to return only the center crop with the oversample of Caffe using Python code the caffe.io.oversample function (not the classifier.py). II have tried to modify the code to return only the center crop however it still returns 10 instead of 1 crops. I have rebuilt the caffe and pycaffe however the situation is still the same. How can I get the python code to return only one crop?? I am confused. I have used matcaffe before but I don't know python, I am just figuring out as I go. Thanks!

Modified part:

def oversample(images, crop_dims, flow=False):
  """
  Crop images into the four corners, center, and their mirrored versions.

  Take
  image: iterable of (H x W x K) ndarrays
  crop_dims: (height, width) tuple for the crops.

  Give
  crops: (10*N x H x W x K) ndarray of crops for number of inputs N.
  """
  # Dimensions and center.
  im_shape = np.array(images[0].shape)
  crop_dims = np.array(crop_dims)
  center = im_shape[:2] / 2.0

  # Make crop coordinates
  # Take center crop.

          crop = np.tile(center, (1, 2))[0] + np.concatenate([
              -self.crop_dims / 2.0,
              self.crop_dims / 2.0
          ])
          crops = images[:, crop[0]:crop[2], crop[1]:crop[3], :]

  return crops

Its original:

def oversample(images, crop_dims, flow=False):
  """
  Crop images into the four corners, center, and their mirrored versions.

  Take
  image: iterable of (H x W x K) ndarrays
  crop_dims: (height, width) tuple for the crops.

  Give
  crops: (10*N x H x W x K) ndarray of crops for number of inputs N.
  """
  # Dimensions and center.
  im_shape = np.array(images[0].shape)
  crop_dims = np.array(crop_dims)
  im_center = im_shape[:2] / 2.0

  # Make crop coordinates
  h_indices = (0, im_shape[0] - crop_dims[0])
  w_indices = (0, im_shape[1] - crop_dims[1])
  crops_ix = np.empty((5, 4), dtype=int)
  curr = 0
  for i in h_indices:
      for j in w_indices:
          crops_ix[curr] = (i, j, i + crop_dims[0], j + crop_dims[1])
          curr += 1
  crops_ix[4] = np.tile(im_center, (1, 2)) + np.concatenate([
      -crop_dims / 2.0,
       crop_dims / 2.0
  ])
  crops_ix = np.tile(crops_ix, (2, 1))

  # Extract crops
  crops = np.empty((10 * len(images), crop_dims[0], crop_dims[1],
                        im_shape[-1]), dtype=np.float32)
  ix = 0
  for im in images:
      for crop in crops_ix:
          crops[ix] = im[crop[0]:crop[2], crop[1]:crop[3], :]
          ix += 1
      crops[ix-5:ix] = crops[ix-5:ix, :, ::-1, :]  # flip for mirrors
      if flow:  #if using a flow input, should flip first channel which  corresponds to x-flow
        crops[ix-5:ix,:,:,0] = 1-crops[ix-5:ix,:,:,0]
  return crops
Shai
  • 111,146
  • 38
  • 238
  • 371
dusa
  • 840
  • 3
  • 14
  • 31
  • it makes no sense using `oversample` if you only want the central crop. What are you trying to do? – Shai May 19 '16 at 05:38
  • I am only trying to decrease the memory usage, that is all, I have decreased the batch size to 16, which I want to keep as as I am using sequences of frames for an RNN, however it still requires more memory, as I have a Tesla computation capability 2.0 so I thought taking only one crop or just using a resized version of an image should help with the memory requirement (even though it might lower accuracy) All in all, I am trying to make minimal changes to the code because I don't know Python. – dusa May 19 '16 at 17:09
  • train time or test time? – Shai May 19 '16 at 17:17
  • test time, I have lowered the batches in training and it worked. I am using this lstm caffe extension https://github.com/LisaAnne/lisa-caffe-public/tree/lstm_video_deploy and trying to classify a video to see if the lstm model I have trained works but I can't, it takes 16 frames at once (and the 10 crops) so it is higher than my Tesla's computation capability. – dusa May 19 '16 at 17:59

1 Answers1

0

I was trying to do the same. In the end, I had to use array slicing to crop the image:

def crop_center(img, cropx, cropy):
    _,x,y = img.shape
    startx = x//2-(cropx//2)
    starty = y//2-(cropy//2)
    return img[:, starty:starty + cropy, startx:startx + cropx]

The input to the function is an (C, W, H) array.