3

As I understood, the simple word2vec approach uses two matrices like the following: Assuming that the corpus consists of N words. Weighted input matrix (WI) with dimensions NxF (F is number of features). Weighted output matrix (WO) with dimensions FxN. We multiply one hot vector 1xN with WI and get a neurone 1xF. Then we multiply the neurone with WO and get an output vector 1xN. We apply softmax function and choose the highest entry (probability) in the vector. Question: how is this illustrated when using the Hierarchical Softmax model? What will be multiplied with which matrix to get the 2 dimensional vector that will lead to branch left or right? P.S. I do understand the idea of the Hierarchical Softmax model using a binary tree and so on, but I don't know how the multiplications are done mathematically.

Thanks

abbudeh
  • 65
  • 8

1 Answers1

0

To make things easy, assume that N is a power of 2. The binary tree will then have N-1 inner nodes. These nodes hook to WO with dimensions Fx(N-1).

Once you have computed a value for each inner node, calculate left and right branch values. Use something like a sigmoid function to assign to (say) the left branch. The right branch is just 1 minus the left.

To predict, find the maximum probability path starting from the root to a leaf.

To train, identify the correct leaf and identify the path of inner nodes to the root. Backpropagate starting with those log(N) nodes.

dan
  • 982
  • 8
  • 24