Implementing a Siamese Network in Tensor Flow

Question

I want to implement a Siamese Convolutional Neural Network, where two images share weights in the convolutional layers, and are then concatenated before being passed through the fully-connected layers. I have tried an implementation, but it seems rather a "hacked" solution. In particular, I have defined an operation on tensors as simply a Python function, and I'm not sure whether this is allowed.

Here is the code I have tried:

images = tf.placeholder(tf.float32, shape=[None, 64 * 64])
# Convolutional layers
# ...
# ...
# Results in pool3_flat, which is the flattened output of the third convolutional layer
pool3_flat = tf.reshape(pool3, [-1, 8 * 8 * 128])

# Now, merge the image pairs, where each pair is composed of adjacent images in the batch, with a stride of 2
def merge_pairs():
  # Create a tensor to store the merged image pairs
  # The batch size is 128, therefore there will be 64 pairs (64 in the first dimension of this tensor)
  merged_pairs = tf.Variable(tf.zeros([64, 8 * 8 * 128]))
  # Split the images into 64 pairs
  pairs = tf.split(0, 64, pool3_flat)
  # For each pair, concatenate the two images across dimension 1, and set this tensor in the appropriate row of merged_pairs
  for pair_num, pair in enumerate(pairs):
      merged_pair = tf.concat(1, pair)
      merged_pairs[pair_num] = merged_pair
  return merged_pairs


# Proceed with operations on the merged_pair tensor, as if the batch size is 64
fc4 = tf.matmul(merge_pairs(), weights4)
# ...
# ...

Whilst this compiles and seems to run fine, the results are not really as expected. So, I'm wondering if there is a better way to implement a Siamese network using built-in operations in TensorFlow?

score 5 · Accepted Answer · edited Sep 25 '17 at 22:13

You can make use of tf.pack and tf.unpack, somewhat like:

pairs = tf.pack(tf.split(0, 64, pool3_flat))
left, right = tf.unpack(tf.transpose(pairs, perm=[1,0,2]))
merged_pairs = tf.concat(1, [left, right])

A cleaner way to do this is to keep your pairs separate from the beginning, so that you can define two networks and use the same trainable variables in each network.

You would have something like (skipping the convolutional layers):

image_left = tf.placeholder(tf.float32, shape=[None, 64, 64, 1])
image_right = tf.placeholder(tf.float32, shape=[None, 64, 64, 1])

pool_left = tf.nn.max_pool(image_left, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
pool_right = tf.nn.max_pool(image_left, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

pool_flat_left = tf.reshape(pool_left, [-1, 32*32])
pool_flat_right = tf.reshape(pool_right, [-1, 32*32])

Then simply concat left and right in dimension 1.

concat_layer = tf.concat(1, [pool_flat_left, pool_flat_right])

This way you can also vary the batch size later. Make sure to use the same weights and biases on each size (left and right).

Ah yes, that makes much more sense actually to explicitly write separate operations for the two images. It's a little bit more code, but makes it much more manageable! — Karnivaurus, Apr 08 '16 at 13:49
Is it correct as your image_left for pooling of right? `pool_right = tf.nn.max_pool(image_left, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')` — Tahir Shahzad, Aug 17 '17 at 10:35

Implementing a Siamese Network in Tensor Flow

1 Answers1

Linked