In my work, I came across the following problem: Given a similarity matrix D, where $d_{i,j} \in \Re$ represents the similarity between objects $i$ and $j$, I would like to select $k$ objects, for $k \in {1, \dots, n}$, in such a way that minimizes the similarity between the selected objects. My first attempt to formally formulate this problem was using the following integer program:
$\minimize$ $d_{1,2}X_1X_2 + d_{1,3}X_1X_3 + \dots + d_{1,n}X_1X_n + d_{2,1}X_2X_1 + \dots + d_{n,n-1}X_nX_{n-1} $
such that $X_1 + X_2 + \dots + X_n = k$ and $X_y \in {0,1}$, for $y=1,\dots,n$
In the above program, $X_y$ indicates whether or not object $y$ was selected. Clearly, the above program is not linear. I tried to make the objective function linear by using variables $X_{1,2} $, which indicates whether or not both objects $X_1$ and $X_2$ were selected. However, I am struggling to formulate the constraint that exactly $k$ objects must be chosen, i.e., the previous constraint $X_1 + X_2 + \dots + X_n = k$.
Since I am not an expert in mathematical programming, I wonder if you could help me with this.
Thank you in advance! All the best,
Arthur