2

I am really new to machine learning and I am currently using Tensorflow Object Detection API to perform object detection, and the model I use is faster_rcnn_resnet101.

What I am looking for is the python code that defined the architecture, such as the numbers of layers(like the code I attached, which is from Tensorflow Tutorial at (https://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/). Tensorflow is not like YOLO, where I can easily find where the architecture is defined...

Thank you so much for your help! I would like to know, where I could find the file that defined the architecture, faster_Rcnn_resnet101?

def create_convolutional_layer(input,
           num_input_channels, 
           conv_filter_size,        
           num_filters):  

      ## We shall define the weights that will be trained using create_weights function.
      weights = create_weights(shape=[conv_filter_size, conv_filter_size, num_input_channels, num_filters])
      ## We create biases using the create_biases function. These are also trained.
      biases = create_biases(num_filters)

      ## Creating the convolutional layer
      layer = tf.nn.conv2d(input=input,
                 filter=weights,
                 strides=[1, 1, 1, 1],
                 padding='SAME')

      layer += biases

      ## We shall be using max-pooling.  
      layer = tf.nn.max_pool(value=layer,
                        ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1],
                        padding='SAME')
       ## Output of pooling is fed to Relu which is the activation function for us.
       layer = tf.nn.relu(layer)

       return layer
Mahdi
  • 133
  • 13
luelue
  • 45
  • 1
  • 7

2 Answers2

2

Tensorflow uses Feature Extraction which is using the representations of learned by a previous network to extract meaningful features from new samples.

Faster_RCNN_ResNet_101 feature extractor is defined in this class : https://github.com/tensorflow/models/blob/master/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py

class FasterRCNNResnet101FeatureExtractor(FasterRCNNResnetV1FeatureExtractor):
  """Faster R-CNN Resnet 101 feature extractor implementation."""

  def __init__(self,
               is_training,
               first_stage_features_stride,
               batch_norm_trainable=False,
               reuse_weights=None,
               weight_decay=0.0):
    """Constructor.
    Args:
      is_training: See base class.
      first_stage_features_stride: See base class.
      batch_norm_trainable: See base class.
      reuse_weights: See base class.
      weight_decay: See base class.
    Raises:
      ValueError: If `first_stage_features_stride` is not 8 or 16,
        or if `architecture` is not supported.
    """
    super(FasterRCNNResnet101FeatureExtractor, self).__init__(
        'resnet_v1_101', resnet_v1.resnet_v1_101, is_training,
        first_stage_features_stride, batch_norm_trainable,
        reuse_weights, weight_decay)

As you can see at the top of the full code there's from object_detection.meta_architectures import faster_rcnn_meta_arch , so probably the general tensorflow implementation of Faster R-CNN detection models is defined in https://github.com/tensorflow/models/blob/master/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py

gameon67
  • 3,981
  • 5
  • 35
  • 61
1

The object detection api used tf-slim to build the models. Tf-slim is a tensorflow api that contains a lot of predefined CNNs and it provides building blocks of CNN. In object detection api, the CNNs used are called feature extractors, there are wrapper classes for these feature extractors and they provided a uniform interface for different model architectures.

For example, model faster_rcnn_resnet101 used resnet101 as a feature extractor, so there is a corresponding FasterRCNNResnetV1FeatureExtractor wrapper class in file faster_rcnn_resnet_v1_feature_extractor.py under the models directory.

from nets import resnet_utils
from nets import resnet_v1    
slim = tf.contrib.slim

In this class, you will find that they used slim to build the feature extractors. nets is a module from slim that contains a lot of predefined CNNs. So regarding your model defining code (layers), you should be able to find it in the nets module, here is resnet_v1 class.

def resnet_v1_block(scope, base_depth, num_units, stride):
  """Helper function for creating a resnet_v1 bottleneck block.
  Args:
    scope: The scope of the block.
    base_depth: The depth of the bottleneck layer for each unit.
    num_units: The number of units in the block.
    stride: The stride of the block, implemented as a stride in the last unit.
      All other units have stride=1.
  Returns:
    A resnet_v1 bottleneck block.
  """
  return resnet_utils.Block(scope, bottleneck, [{
      'depth': base_depth * 4,
      'depth_bottleneck': base_depth,
      'stride': 1
  }] * (num_units - 1) + [{
      'depth': base_depth * 4,
      'depth_bottleneck': base_depth,
      'stride': stride
  }])


def resnet_v1_50(inputs,
                 num_classes=None,
                 is_training=True,
                 global_pool=True,
                 output_stride=None,
                 spatial_squeeze=True,
                 store_non_strided_activations=False,
                 min_base_depth=8,
                 depth_multiplier=1,
                 reuse=None,
                 scope='resnet_v1_50'):
  """ResNet-50 model of [1]. See resnet_v1() for arg and return description."""
  depth_func = lambda d: max(int(d * depth_multiplier), min_base_depth)
  blocks = [
      resnet_v1_block('block1', base_depth=depth_func(64), num_units=3,
                      stride=2),
      resnet_v1_block('block2', base_depth=depth_func(128), num_units=4,
                      stride=2),
      resnet_v1_block('block3', base_depth=depth_func(256), num_units=6,
                      stride=2),
      resnet_v1_block('block4', base_depth=depth_func(512), num_units=3,
                      stride=1),
  ]
  return resnet_v1(inputs, blocks, num_classes, is_training,
                   global_pool=global_pool, output_stride=output_stride,
                   include_root_block=True, spatial_squeeze=spatial_squeeze,
                   store_non_strided_activations=store_non_strided_activations,
                   reuse=reuse, scope=scope)

The example code above explained how a resnet50 model is built (Choose resnet50 since the same concept with resnet101 but less layers). It is noticeable that resnet50 has 4 blocks with each contains [3,4,6,3] units. And here is a diagram of resnet50, there you see the 4 blocks.

enter image description here

So we are done with the resnet part, those features extracted by the first stage feature extractor (resnet101) will be fed to the proposal generator and it will generate regions, these regions together with the features, will then be fed into the box classifier for class prediction and bbox regression.

The faster_rcnn part, is specified as meta_architectures, meta_architectures are a receipe for converting classification architectures into detection architectures, in this case, from resnet101 to faster_rcnn. Here is a diagram of faster_rcnn_meta_architecture (source).

enter image description here

Here you see in the box classifier part, there are also pooling operations (for the cropped region) and convolutional operations (for extracting features from the cropped region). And in the class faster_rcnn_meta_arch, this line is the maxpool operation and the later convolution operation is performed in the feature extractor class again, but for the second stage. And you can clearly see another block being used.

def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.
    Args:
      proposal_feature_maps: A 4-D float tensor with shape
        [batch_size * self.max_num_proposals, crop_height, crop_width, depth]
        representing the feature map cropped to each proposal.
      scope: A scope name (unused).
    Returns:
      proposal_classifier_features: A 4-D float tensor with shape
        [batch_size * self.max_num_proposals, height, width, depth]
        representing box classifier features for each proposal.
    """
    with tf.variable_scope(self._architecture, reuse=self._reuse_weights):
      with slim.arg_scope(
          resnet_utils.resnet_arg_scope(
              batch_norm_epsilon=1e-5,
              batch_norm_scale=True,
              weight_decay=self._weight_decay)):
        with slim.arg_scope([slim.batch_norm],
                            is_training=self._train_batch_norm):
          blocks = [
              resnet_utils.Block('block4', resnet_v1.bottleneck, [{
                  'depth': 2048,
                  'depth_bottleneck': 512,
                  'stride': 1
              }] * 3)
          ]
          proposal_classifier_features = resnet_utils.stack_blocks_dense(
              proposal_feature_maps, blocks)
    return proposal_classifier_features
Danny Fang
  • 3,843
  • 1
  • 19
  • 25