As I understood, the simple word2vec approach uses two matrices like the following: Assuming that the corpus consists of N words. Weighted input matrix (WI) with dimensions NxF (F is number of features). Weighted output matrix (WO) with dimensions FxN. We multiply one hot vector 1xN with WI and get a neurone 1xF. Then we multiply the neurone with WO and get an output vector 1xN. We apply softmax function and choose the highest entry (probability) in the vector. Question: how is this illustrated when using the Hierarchical Softmax model? What will be multiplied with which matrix to get the 2 dimensional vector that will lead to branch left or right? P.S. I do understand the idea of the Hierarchical Softmax model using a binary tree and so on, but I don't know how the multiplications are done mathematically.
Thanks