I have two Strings that I am checking for specific common words in both of them. I already have the semantic scores; irrelevant in this case as these words are technical abbreviations and have special emphasis. The more set of common words they have, higher the score and closer they are.
There are many ways of going about this. So far I have thought of two.
1) I create two ArrayList with the words of the strings. I have to another set of words that I search if they exist in both the ArrayList. If they do, I give them a score +1.
then I can have multiple conditions like
if((firstString.contains(keyWord)) && (secondString.contains(keyWord)))
then +1
if((firstString.contains(anotherKeyWord)) && (secondString.contains(anotherKeyWord)))
then +1
2> Take two string and have regex search using
if firstString.("(.*)someExpression(.*)")) && secondString.("(.*)someExpression(.*)"))
then +1
if firstString.("(.*)someOtherExpression(.*)")) && secondString.("(.*)someOtherExpression(.*)"))
then +1
Are there other better ways of doing this? I am more inclined to use regex now. It looks pretty efficient way of doing this.
Basically what I am doing is I am trying to cluster similar sentences by grouping sentences with abbreviations such as "ACLS", "ASHD", "CXR" (Common medical terms) as I know these sentences talk about those issues primarily. Then I get semantic scores to group those sentences that have these words in them. Wrong Approach :/ ?
Thank you :)