0

How to write a stored procedure named cosine_similarity that takes in two input parameters doc1_ID (type:int) and doc2_ID (type:int), and one output parameter sim_val (type:double) to calculate the cosine-similarity value for any 2 records (corresponding to 2 documents i.e. DocIDs)?

delimiter $$
CREATE PROCEDURE cosine_similarity (IN doc1_ID int, IN doc2_ID int, OUT sim_val double)
begin'

declare var1_1 int; 
declare var2_1 VARCHAR(100);
declare var1_2 int; 
declare var2_2 VARCHAR(100);


Select term, frequency into var1_1, var2_1 from DOCTERMFREQ where DOCID=doc1_ID;
Select term, frequency into var1_2, var2_2 from DOCTERMFREQ where DOCID=doc2_ID;


set sim_val= (SUM(var2_1 * var2_2)/ SQRT(SUM(var2_1 * var2_1))/ SQRT(SUM(var2_2 * var2_2)

end $$
delimiter ;
James Z
  • 12,209
  • 10
  • 24
  • 44
liam
  • 1
  • 1
    Is there a question? nb there is a quote character after BEGIN which causes this not to syntax - is that a transcription error? – P.Salmon Mar 29 '20 at 07:34
  • I am tasked to write a stored procedure to calculate cosine similarity given two document ID . The document ID is a primary key to a set of two term matrix which I need to find the cosine similarity of – liam Mar 29 '20 at 10:50
  • no idea what cosine similarity is but a very quick search found https://stackoverflow.com/questions/42310655/sql-computation-of-cosine-similarity – P.Salmon Mar 29 '20 at 11:40
  • Here is the formula: https://www.machinelearningplus.com/nlp/cosine-similarity/ How do I write is as a store procedure to calculate this – liam Mar 29 '20 at 23:29

0 Answers0