I'm trying to count duplicates based on a column of a table in an Oracle Database. This query using group by:
select count(dockey), sum(total)
from
(
select doc1.xdockeyphx dockey, count(doc1.xdockeyphx) total
from ecm_ocs.docmeta doc1
where doc1.xdockeyphx is not null
group by doc1.xdockeyphx
having count(doc1.xdockeyphx) > 1
)
Returns count = 94408
and sum(total) = 219330
. I think this is the correct value.
Now, trying this other query using a self join:
select count(distinct(doc1.xdockeyph))
from ecm_ocs.docmeta doc1, ecm_ocs.docmeta doc2
where doc1.did > doc2.did
and doc1.xdockeyphx = doc2.xdockeyphx
and doc1.xdockeyphx is not null
and doc2.xdockeyphx is not null
The result is also 94408 but this one:
select count(*)
from ecm_ocs.docmeta doc1, ecm_ocs.docmeta doc2
where doc1.did > doc2.did
and doc1.xdockeyphx = doc2.xdockeyphx
and doc1.xdockeyphx is not null
and doc2.xdockeyphx is not null
Is returning 1567466, which I think is wrong.
The column I'm using to find duplicates is XDOCKEYPHX and the DID is the primary key of the table.
Why is the value sum(total)
different from the result of the last query? I can't see why the last query is returning more duplicate rows than expected.