0

I have an API that returns most_similar_approx from a magnitude model . The model is built from native Word2Vec format with 50 dimensions and 50 trees. The magnitude model is close to 350MB, with approximately 350000 tokens. Load testing this API I observed that the performance deteriorates as I increase the topn value for most_similar_approx, I need a high number of similar tokens for downstream activities, with topn=150 I get a throughput of 500 transactions per second on the API, while gradually reducing it I get 800 transactions with topn=50 and and ~1300 with topn=10. The server instance is not under any memory/cpu load, am using a c5.xlarge AWS EC2 instance.

Is there anyway I can tune the model to improve the performance for a high topn value? My aim is to obtain most_similar tokens from word embeddings, and pymagnitude was the most recommended option I found, are there any similar high performing alternatives.

  • Can you show your benchmarking code? Have you tested against plain `most_similar()`, in either `pymagnitude` or `gensim`? Have you tried varying the `effort` parameter to sacrifice accuracy for speed? Why *isn't* the current rate fast enough? Have you tried a machine with more virtual cores, or could simply using more parallel cores or more machines be a solution in practice? – gojomo Feb 02 '21 at 18:14
  • I used nGrinder (https://github.com/naver/ngrinder) for the tests, based on these I found performance to be in order of: Gensim's most_similar < Pymagnitude's most_similar < Pymagnitude's most_similar_approx I did try using the effort parameter, down to 0.5, but it did not help significantly. With Larger instances, eg. with c5.2xlarge I do get twice the performance (TPS=1000 at topn=150) , but I cannot use these due to cost constraints. External caching and larger instances are my final options if optimization fails – ptonapi Feb 03 '21 at 07:28
  • Can you show your benchmarking code, and the actual (not figurative) results for each alternative tested, in the question? How are you verifying the single server is under no memory/CPU load during testing, & if that's true can you simply use more processes on the same machine? – gojomo Feb 03 '21 at 17:29
  • Also, 350K tokens times 150 most-similar int indexes (4 bytes) & float similarities (4 bytes) is only 420MB. You could precalculate & store all top-150 actual neighbors in that much RAM & convert all operations to simple lookup. (Or all top-1500 actual neighbors in just 4.2GB.) – gojomo Feb 03 '21 at 18:21

0 Answers0