Ok, lets build us adjacency matrix W for that graph following the simple procedure:
if both of adjacent vertexes i-th and j-th are of the same color then weight of the edge between them W_{i,j} is big number (which you will tune in your experiments later) and else it is some small number which you will figure out analogously.
Now, lets write Laplacian of the matrix as
L = D - W, where D is a diagonal matrix with elements d_{i,i} equal to the sum of W i-th row.
Now, one can easily show that the value of
fLf^T, where f is some arbitrary vector, is small if vertexes with huge adjustments weights are having close f values. You may think about it as of the way to set a coordinate system for graph with i-the vertex has f_i coordinate in 1D space.
Now, let's choose some number of such vectors f^k which give us representation of the graph as a set of points in some euclidean space in which, for example, k-means works: now you have i-th vertex of the initial graph having coordinates f^1_i, f^2_i, ... and also adjacent vectors of the same color on the initial graph will be close in this new coordinate space.
The question about how to choose vectors f is a simple one: just take couple of eigenvectors of matrix L as f which correspond to small but nonzero eigenvalues.
This is a well known method called spectral clustering.
Further reading:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction. by Trevor Hastie, Robert Tibshirani and Jerome Friedman
which is available for free from the authors page http://www-stat.stanford.edu/~tibs/ElemStatLearn/