If you have documents d1, d2, d3, d4 - if you compared each document with all other documents, then it would be O(N^2)
. However, I assume that comparing d1 to d2 is the same as comparing d2 to d1, so you can optimize there. So basically, you only need to compare d1-d2, d1-d3, d1-d4, d2-d3, d2-d4, d3-d4, which is O((N-1)!
).
Perhaps start by building a map of all comparisons that need to be done. Then, split that map into X equal size collections, where X is the number of processes you want to run. Finally, spin off that many threads (or farm the work out to that many servers), and let them run, then merge the results back together.
If you need to preprocess each document individually (so the comparisons really don't matter at that point), then just break the problem up into as many processes as you want, and distribute that work across the processes. Without really know what kind of preprocessing and comparison and document types you're dealing with, I can't really get into much more specifics than that.