FLOPs in Tensor Flow Convolution Layer

Question

I am wondering about the number of float operations in Tensorflow Convolutional layers.

As I am waiting for this functionality to be released on TF 2.x, I tried it out for TF 1.x, and I came to outcomes that I do not understand how is being computed, and one of them is very impressive (check Q3).

I have the following code:

tf.reset_default_graph()
model = tf.keras.models.Sequential([
        InputLayer((32, 32, 1)),
        # Conv2D(1, 5, padding='same'),
        # Flatten(),
        # Dense(1, activation='softmax')
    ])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

opts = tf.profiler.ProfileOptionBuilder.float_operation()
profile = tf.profiler.profile(tf.get_default_graph(), tf.RunMetadata(), cmd='op', options=opts)
profile.total_float_ops

Complete Gist here:

https://colab.research.google.com/gist/eduardo4jesus/6721ec992c402bcdc834ab2edbc1b2b4/tf1-flops.ipynb

What is the explanation for the results bellow?

If I run the code above, only with the InputLayer uncommented, the FLOPS output is 2.

Q1: Why 2?

If I run the code as bellow, the output is 2050.

model = tf.keras.models.Sequential([
        InputLayer((32, 32, 1)),
        Flatten(),
        Dense(1, activation='softmax')
    ])

Q2: Why 2050?? I was expecting 1026 from 1024 plus those unexplained 2. These 1024 would be from the weights of the dense layer, since we have one neurone is one parameter per each input feature, therefore, 1024. Again, why double? (Back propagation??)

The most intriguing and important one. If I run the code as bellow, the output is 2101.

model = tf.keras.models.Sequential([
        InputLayer((32, 32, 1)),
        Conv2D(1, 5, padding='same'),
        Flatten(),
        Dense(1, activation='softmax')
    ])

Q3: Why 2101?? I was expecting 2050 + 1024 x 5 which is way greater than only 2101. The convolution layer itself should yield N*N*K*K where N=32 and K=5. How come the model takes less FLOPs than only the last layer, given that the convolution produces the same shape of its input? What kind of crazy optimization it has?

[Update]

When printing profile I have these nodes contributing to total_float_ops. Most of them (see bellow) are associated with the Initializer, not the Model computation itself.

name: "_TFProfRoot"
total_float_ops: 2101
children {
  name: "Mul"
  float_ops: 1050
  total_float_ops: 2101
  graph_nodes {
    name: "conv2d/kernel/Initializer/random_uniform/mul"
    float_ops: 25
    total_float_ops: 25
    input_shapes {
      key: 0
      value {
        dim {
          size: 5
        }
        dim {
          size: 5
        }
        dim {
          size: 1
        }
        dim {
          size: 1
        }
      }
    }
    input_shapes {
      key: 1
      value {
        dim {
          size: 1
        }
      }
    }
    total_definition_count: 1
  }
  graph_nodes {
    name: "dense/kernel/Initializer/random_uniform/mul"
    float_ops: 1024
    total_float_ops: 1024
    input_shapes {
      key: 0
      value {
        dim {
          size: 1024
        }
        dim {
          size: 1
        }
      }
    }
    input_shapes {
      key: 1
      value {
        dim {
          size: 1
        }
      }
    }
    total_definition_count: 1
  }
  graph_nodes {
    name: "loss/dense_loss/weighted_loss/Mul"
    input_shapes {
      key: 0
      value {
        dim {
          size: -1
        }
      }
    }
    input_shapes {
      key: 1
      value {
        dim {
          size: -1
        }
      }
    }
    total_definition_count: 1
  }
  graph_nodes {
    name: "loss/dense_loss/weighted_loss/broadcast_weights"
    input_shapes {
      key: 0
      value {
        dim {
          size: 1
        }
      }
    }
    input_shapes {
      key: 1
      value {
        dim {
          size: -1
        }
      }
    }
    total_definition_count: 1
  }
  graph_nodes {
    name: "loss/mul"
    float_ops: 1
    total_float_ops: 1
    input_shapes {
      key: 0
      value {
        dim {
          size: 1
        }
      }
    }
    input_shapes {
      key: 1
      value {
        dim {
          size: 1
        }
      }
    }
    total_definition_count: 1
  }
  children {
    name: "Add"
    float_ops: 1049
    total_float_ops: 1051
    graph_nodes {
      name: "conv2d/kernel/Initializer/random_uniform"
      float_ops: 25
      total_float_ops: 25
      input_shapes {
        key: 0
        value {
          dim {
            size: 5
          }
          dim {
            size: 5
          }
          dim {
            size: 1
          }
          dim {
            size: 1
          }
        }
      }
      input_shapes {
        key: 1
        value {
          dim {
            size: 1
          }
        }
      }
      total_definition_count: 1
    }
    graph_nodes {
      name: "dense/kernel/Initializer/random_uniform"
      float_ops: 1024
      total_float_ops: 1024
      input_shapes {
        key: 0
        value {
          dim {
            size: 1024
          }
          dim {
            size: 1
          }
        }
      }
      input_shapes {
        key: 1
        value {
          dim {
            size: 1
          }
        }
      }
      total_definition_count: 1
    }
    children {
      name: "Sub"
      float_ops: 2
      total_float_ops: 2
      graph_nodes {
        name: "conv2d/kernel/Initializer/random_uniform/sub"
        float_ops: 1
        total_float_ops: 1
        input_shapes {
          key: 0
          value {
            dim {
              size: 1
            }
          }
        }
        input_shapes {
          key: 1
          value {
            dim {
              size: 1
            }
          }
        }
        total_definition_count: 1
      }
      graph_nodes {
        name: "dense/kernel/Initializer/random_uniform/sub"
        float_ops: 1
        total_float_ops: 1
        input_shapes {
          key: 0
          value {
            dim {
              size: 1
            }
          }
        }
        input_shapes {
          key: 1
          value {
            dim {
              size: 1
            }
          }
        }
        total_definition_count: 1
      }
    }
  }
}

y.selivonchyk · Answer 1 · 2019-12-24T08:16:39.583

I think this API is, at best, experimental.

Q1. No idea where 2 is coming from.

Q2. 2 is related to Input as we saw. 2048 left. Your input size is 32*32*1 which is 1024 flattened. Your calculation is xW+b, where x is [1024], corresponding W is [1, 1024]. Operation of xW would lead to 1024 multiplications and 1024 additions. Bias add seems to be ignored, because it should otherwise result in total of 2051 ops: 2+1024+1024+1.

Q3. I changed your filter size to 3 and got 21 flops, which is quit ridiculous. The number didn't change for CPU/GPU executor. My conclusion would be that Convolutional layers do not produce believable numbers.

tf.keras.models.Sequential([
        InputLayer((32, 32, 1)),
        Conv2D(1, 3, padding='same'),
        Flatten(),
    ]) # => 21 ops



tf.keras.models.Sequential([
    InputLayer((32, 32, 1)),
    Conv2D(32, 3, padding='same'),
    Conv2D(1, 3, padding='same'),
    Flatten(),
]) # => 1.09K ops

FLOPs in Tensor Flow Convolution Layer

1 Answers1