In Derby and/or MySQL, I am wondering about the performance of:
select c0.documentid from contentblob as c0 where c0.documentid > ?
and c0.blobid = ?
and exists (select * from contentblob c1 where c1.documentid = c0.documentid
and c1.blobid = ?)
order by c0.documentid
Note that DIFFERENT values will be supplied for the two blobid comparisons. What this query does is return the id's that occur in the table (at least) twice: once with blobid1 and once with blobid2.
compared to the equivalent self-join,
assuming, of course, indices on the documentid and blobid columns of the contentblob table.
In case it isn't clear, the relevant schema here is a table with two varchar columns, documentid and blobid. They are composed to form the primary key.
If there's nothing for it but to inquire of the query analyzer in the db, I'll go do that.