I'm working on a Java project where I need to match user queries against several engines. Each engine has a method similarity(Object a, Object b) which returns: +1 if the objects surely match; -1 if the objects surely DON'T match; any float in-between when there's uncertainty.
Example: user searches "Dragon Ball".
- Engine 1 returns "Dragon Ball", "Dragon Ball GT", "Dragon Ball Z", and it claims they are DIFFERENT result (similarity=-1), no matter how similar their names look. This engine is accurate, so it has a high "weight" value.
- Engine 2 returns 100 different results. Some of them relate to DBZ, others to DBGT, etc. The engine claims they're all "quite similar" (similarity between 0.5 and 1).
- The system queries several other engines (10+)
I'm looking for a way to build clusters out of this system. I need to ensure that values with similarity near -1 will likely end up in different clusters, even if many other values are very similar to all of them.
Is there a well-known clustering algorithm to solve this problem? Is there a Java implementation available? Can I build it on my own, perhaps with the help of a support library? I'm good at Java (15+ years experience) but I'm completely new at clustering.
Thank you!