0

This seems like it would be simple, but I can't get things to work. I 100 dimension vector spaces and I have several vectors in each space that are matched. I want to find the transformation matrix (W) such that:

a_vector[0] in vector space A x W = b_vector[0] in vector space B (or approximation).

So a paper mentions the formula for this.

enter image description here

So no bias term, no activation that we typically see.

I've tried using sklearns Linear Regression without much success.

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

regression_model = LinearRegression(fit_intercept=True)
regression_model.fit(X_train, y_train)

regression_model.score(X_test, y_test)
> -1451478.4589335269 (!!???)

y_predict = regression_model.predict(X_test)

regression_model_mse = mean_squared_error(y_predict, y_test)

regression_model_mse = 524580.06

Tried tensorflow without much success. Don't care about the tool - tensorflow, sklearn - just looking for help with the solutions.

Thanks.

EDIT

so I hand rolled the code below - maxing for cosine sim (representing how close the predicted points are to the real points - 1.00 = perfect match) - but it is VERY SLOW.

shape = (100,100)
W1 = np.random.randn(*shape).astype(np.float64) / np.sqrt(sum(shape))
avgs = []
for x in range(1000):
    shuffle(endevec)
    distance = [0]
for i,x in enumerate(endevec):
    pred1 = x[0].dot(W1) 
    cosine = 1 - scipy.spatial.distance.cosine(pred1, x[1])
    distance.append(cosine)
    diff = pred1 - x[0]
    gradient = W1.T.dot(diff) / W1.shape[0]
    W1 += -gradient * .0001
avgs.append(np.mean(distance))
sys.stdout.write('\r')
# the exact output you're looking for:
sys.stdout.write(str(avgs[-1]))
sys.stdout.flush()

EDIT 2

Jeanne Dark below had a great answer for finding the transformation matrix using: M=np.linalg.lstsq(source_mtrx[:n],target_mtrx[:n])[0]

On my dataset of matched vecs, the predicted vecs using the TM found with this method was:

minmax=(-0.09405095875263214, 0.9940633773803711)
mean=0.972490919224675 (1.0 being a perfect match) 
variance=0.0011325349465895844
skewness=-18.317443753033665
kurtosis=516.5701661370497

Had tiny amount of really big outliers.

The plot of cosine sim was:

enter image description here

E G
  • 498
  • 6
  • 7
  • Regression/OLS may not be the right tool. It is only appropriate if we want to find x given W (design matrix) and z (outcome vector), but it looks like you are finding W given x and z. – pe-perry Mar 20 '18 at 04:28
  • but that's specifically the tool mentioned in the paper. – E G Mar 20 '18 at 16:37
  • I might misunderstand your problem. Would you provide some examples of `x` and `y` and why there is train and test set? Or even provide a link of the paper? Thanks! – pe-perry Mar 21 '18 at 02:09
  • the X (input) are 100 dim vectors in a vector space A. the Y (output) are 100 dim vectors in vector space B. I want to find a transformation matrix such that – E G Mar 25 '18 at 01:46
  • the X (input) are 100 dim vectors in a vector space A. the Y (output) are 100 dim vectors in vector space B. These points are matched. I want to find a transformation matrix such that - vec1 from A x matrix = vec1 from B. The difficulty seems to be going from 100 dim input to 100 dim output - most SGD usually involves reduced dim in output it seems - and that the input and outputs are continuous, not categorical. the paper is below, but it doesn't address this issue - the only mention is the one I posted above. https://arxiv.org/abs/1309.4168. thanks – E G Mar 25 '18 at 01:53

1 Answers1

2

I was having exactly the same problem yesterday. I ended up using numpy.linalg.lstsq and I think it works.

# find tranformation matrix M so that: source_matrix∙M = target_matrix based 
          #on top n most frequent terms in the target corpus
n=500  # the choice of n depends on the size of your vocabulary
M=np.linalg.lstsq(source_mtrx[:n],target_mtrx[:n])[0]
print M.shape # returns (100,100)

# apply this tranformation to source matrix:
new_mtrx= np.array([np.dot(i, M) for i in source_mtrx])

Also check out this paper Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings. They are based on the paper that you mentioned and they follow the same method but they explain the implementation with more details. For example, they suggest that in order to find the transformation matrix M we only use the vectors of the top n most frequent terms, and then, after we apply the transformation to the source matrix, we calculate the similarity for the remaining terms.

Please let me know if u find another solution for calculating M based on SGD.

JeanneDark
  • 21
  • 3
  • thanks for your help Jeanne. linalg worked pretty well and very fast. for my set of matched vecs the describe stats were: minmax=(-0.09405095875263214, 0.9940633773803711), mean=0.972490919224675, variance=0.0011325349465895844, skewness=-18.317443753033665, kurtosis=516.5701661370497) – E G Mar 29 '18 at 19:01
  • thanks for your help. I added your advice and a summary of my results above. – E G Mar 29 '18 at 19:05
  • I'm happy it worked :). Could I ask what dataset you are using and what is its size? I am using a quite small one so my results are not that good yet but i'm planning to use the model for a much bigger one, once I have access to the data – JeanneDark Mar 31 '18 at 11:01
  • I am using a custom data set. It's actually document vectors not word vectors. The dataset is about 5000-6000 documents. I am actually trying to find an accurate mapping by taking a subset - get a transformation matrix for 5000 doc vecs to get a fairly accurate mapping of two sets of 50000. – E G Apr 01 '18 at 17:51