Why `xavier_initializer()` and `glorot_uniform_initializer()` are duplicated to some extent?

Question

xavier_initializer(uniform=True, seed=None, dtype=tf.float32) and glorot_uniform_initializer(seed=None, dtype=tf.float32) refer to the same person Xavier Glorot. Why not consolidate them into one function?

xavier_initializer is in tf.contrib.layers. glorot_uniform_initializer in tf. Will the namespace of contrib eventually go away and things in contrib will be moved to the namespace of tf?

For the benefit of posterity, please consider accepting whichever answer helped you the most! — kmario23, Sep 08 '19 at 15:28

kmario23 · Answer 1 · 2019-09-08T15:26:57.537

Yes, tf.contrib.layers.xavier_initializer and tf.glorot_uniform_initializer both implement the same concept described in this JMLR paper: Understanding the difficulty of training deep feedforward neural networks, which can be seen in the code:

With typical values for fan_in, fan_out, mode = FAN_AVG , and uniform = True, both implementations sample values from the standard uniform distribution over the limit [-sqrt(3), sqrt(3))

Because tf.initializer has support for a wide variety of initialization strategies, it's highly likely that it will stay whereas the initialization from contrib which just has xavier_initialization will most probably be deprecated in future versions.

So, yes it's highly likely that in future versions the tf.contrib.layers.xavier_initialier way of initialization might go away.

Dylan F · Answer 2 · 2017-12-27T06:45:06.387

Interesting question! I'll start with tf.contrib:

Will the namespace of contrib go away? Only when there's no more unstable community contributions to add to TensorFlow - so never. This question may be of interest. I'll summarize. The contrib namespace is for user-contributed code that is supported by the community (not TensorFlow). Code in contrib is useful enough to be in the API, and probably will be merged eventually. But, until it's thoroughly tested by the TensorFlow team it stays in contrib. I'm confident the docs used to explain why contrib exists, but I can't find it anymore. The closest thing is in the API stability promise, which explains that contrib functions/classes are subject to change!

A little more in-depth, things in contrib generally merge into tf eventually. For example, the entirety of Keras merged from contrib to tf.keras in 1.4. But, the exact process of the merge varies. For instance, compare tf.contrib.rnn and RNN functionality in tf.nn. Quite a bit of tf.nn aliases tf.contrib.rnn. I mean, click anything on the tf.nn.rnn_cell guide. You'll be looking at the tf.rnn.contrib doc! Try it! It seems that using tf.contrib.rnn is very stable, despite the fact that it's migrated into "native" tf. On the other hand, the Datasets merge isn't so clean (contrib'ed in 1.3, merged in 1.4). Because some - very few - bits of code were changed during the merge, using tf.contrib.data.TFRecordDataset will give you a nice depreciation warning. And, some things have been in contrib for quite a while and show no signs of merging soon: tf.contrib.training.stratified_sample comes to mind. I believe contrib.keras had been around for a while before merging.

Now onto Xavier/Glorot:

Here's links to the source for contrib.xavier... and tf.glorot.... The source code looks (nearly) the same, but let's follow variance_scaling_initializer. Now things differ: xavier has a function and glorot uses a class (VarianceScaling is aliased as variance_scaling_initializer). Similar again, yes, but at a glance the "native" tf version gives us some different error messages and some better input validation.

So why not remove contrib.xavier? I don't know. If I had to speculate, it's because contrib.xavier took off. I mean, I still use it and I still see it all the time (citation needed?). Now that I know glorot is basically the same, I'm not sure that I'll keep using contrib.xavier. But I digress. I suspect xavier has stayed around because removing it would break a reasonable amount of code. Sure, there's no stability promise for contrib, but why fix (or break) what's not broken?

Posting an issue or pull request on Github could generate some more interesting responses from actual contributors. I suspect you would get reasons it hasn't and won't be removed, but maybe not. My quick search for "xavier" and then "glorot" in the Issues suggests it hasn't been asked before.

EDIT: To be clear, as kmario points out, they're mathematically identical. I'm pointing out that the implementation, as it is today, differs slightly in the realm of input validation and structure. He seems to think xavier is more likely to depreciate than I initially thought. I'll happily defer to him because he's probably more experienced than I am.

Why `xavier_initializer()` and `glorot_uniform_initializer()` are duplicated to some extent?

2 Answers2