I want to use feature extraction in my program and then estimate the optimal weight of each feature and compute the score of new input record.
For example, I have a paraphrase dataset. Each record in this dataset is a pair of two sentences that the similarity of two sentences is indicated with a value between 0 and 1. After I extracted e.g. 4 features, I create new dataset with these feature values and similarity scores. I want to use this new dataset to learn the weights:
Paraphrase dataset:
"A problem was solved by a mathematician"; "A mathematician was found a solution for a problem"; 0.9
.
.
New dataset:
0.42; 0.61; 0.21; 0.73; 0.9
.
.
I want to use regression to estimate the weight of each feature. I want to compute the similarity of the input sentences in the program with equation 1: S = W1*F1 + W2*F2 + W3*F3 + W4*F4
I know the Regression algorithm could be used for this work but I don't know how? Please guide me to do this work? Is there any paper or document used the Regression algorithm?