9

i notice there are two functions about negative Sampling in tensorflow to compute the loss (sampled_softmax_loss and nce_loss). the paramaters of these two function are similar, but i really want to know what is the difference between the two?

王乐义
  • 91
  • 1
  • 3

3 Answers3

3

Sample softmax is all about selecting a sample of the given number and try to get the softmax loss. Here the main objective is to make the result of the sampled softmax equal to our true softmax. So algorithm basically concentrate lot on selecting the those samples from the given distribution. On other hand NCE loss is more of selecting noise samples and try to mimic the true softmax. It will take only one true class and a K noise classes.

Shamane Siriwardhana
  • 3,951
  • 6
  • 33
  • 73
1

Sampled softmax tries to normalise over all samples in your output. Having a non-normal distribution (logarithmic over your labels) this is not an optimal loss function. Note that although they have the same parameters, they way you use the function is different. Take a look at the documentation here: https://github.com/calebchoo/Tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.nn.nce_loss.md and read this line:

By default this uses a log-uniform (Zipfian) distribution for sampling, so your labels must be sorted in order of decreasing frequency to achieve good results. For more details, see log_uniform_candidate_sampler.

Take a look at this paper where they explain why they use it for word embeddings: http://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrastive-estimation.pdf

Hope this helps!

rmeertens
  • 4,383
  • 3
  • 17
  • 42
  • the first link is broken, use: https://github.com/calebchoo/Tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.nn.nce_loss.md – Denis Kuzin Jun 22 '17 at 15:16
  • Thanks! Fixed the link – rmeertens Jun 22 '17 at 16:00
  • "your labels must be sorted in order of decreasing frequency" -- what does this mean practically? you have to sort the examples in your mini-batch by label frequency? – eggie5 May 01 '20 at 00:38
  • 1
    @eggie5, the id, which is encoded as 0 should be the most frequent one in your corpus. And the id, which is encoded as `n_ids-1` should be the least frequent one in your corpus. – YQ.Wang Mar 17 '21 at 06:47
0

Check out this documentation from TensorFlow https://www.tensorflow.org/extras/candidate_sampling.pdf

They seem pretty similar, but sampled softmax is only applicable for a single label while NCE extends to the case where your labels are a multiset. NCE can then model the expected counts rather than presence/absence of a label. I'm not clear on an exact example of when to use the sampled_softmax.

phdscm
  • 233
  • 1
  • 8