I have a set of images and I asked on MTurk whether given two images, they belong to the same category or not (There is more application-specific nuance here but essentially we are asking whether they belong to the same category or not).
My question is how to construct cluster assignment from such answers, assume all possible pairs within the set are answered. Ideally also robust to noise (we already duplicated questions and plan to use majority vote).
One example, assuming there are three images A B C D. Assuming the answer is the following: A similar to B C similar to D A different than C B different than C A different than D B different than D
The output should be two clusters (A, B) and (C, D). Note that we do not know the number of clusters in advance and would like to infer that from the answers.
I found some related questions on SO but they are not exactly the same. For instance, they might be based on distance instead of a boolean answer (yes or no). I might be able to reduce my question to the form of distance but I suppose my question is even easier than the distance setting. Related questions here:
Clustering given pairwise distances with unknown cluster number?
https://stats.stackexchange.com/questions/2717/clustering-with-a-distance-matrix
Would be even more ideal that the algorithms have python implementation already (e.g., sklearn). But if not, I don't mind to implement by myself.
Thank you.