Single out duplicates between two result sets

Question

I'm having trouble with a query I'm trying to do. I'm trying to select values that come up in to result sets.

I'm currently dealing with two queries:

A)

SELECT     /*+ RULE */ pi.compressed_name, pi.phn, to_char(pi.date_of_birth , 'YYYY/MM/DD') as date_of_birth, gen.description as gender
        , to_char(pe.started_on , 'YYYY/MM/DD' ) as medicare_eligibility_start , to_char(pe.ended_on , 'YYYY/MM/DD' ) as medicare_eligibility_end
FROM    medcrtr.forest_node fnpppp,
    medcrtr.forest_node fnppp,
    medcrtr.forest_node fnpp,
    medcrtr.forest_node fnp,
    medcrtr.forest_node fn,
    medcrtr.group_member gm,
    medcrtr.group_type gt,
    medcrtr.program_eligibility pe,
    person_index pi,
    gender_type gen
WHERE   gm.entity_type_id = 1 --:P_PERSON_ENTITY_TYPE_ID
    AND gen.id = pi.gender_code
    AND gt.id = gm.group_id
    AND gt.category_id = 1 --icgroupmemebrcategory :P_GROUP_CATEGORY_ID
    AND fn.source_id = group_id
    AND fn.entity_type_id = 3 --icGOM:P_ENTITY_TYPE_ID
    AND fnp.id = fn.parent_id
    AND fnpp.id = fnp.parent_id
    AND fnppp.id = fnpp.parent_id
    AND fnpppp.id = fnppp.parent_id
    AND pe.person_id = gm.source_id
    AND pe.sub_program_id = fnpp.parent_id
    AND pi.person_id = gm.source_id
    AND fnppp.id = 1169
    AND (gm.ended_on >= SYSDATE OR gm.ended_on IS NULL)

B) Same as above other than second last line, in which

AND fnppp.id = 1715

So now the first query returns 1536 records, and the second returns 2067. There are however records(people) that come up in both queries. What I want to do is single them out. (Essentially I want to do an intersection so I"ll look into that)

So far I've tried to do the following:

-I've added the following to query A:

AND pi.person_id NOT IN (SELECT    /*+ RULE */ pi.person_id FROM ... )

where the query inside the brackets is query B (other than for the first line). This returns about 10 rows (I'm not sure that's accurate).

-I then ran the same query except with NOT IN. That gives me a result of about 200(not sure if this is accurate).

Now those two numbers obviously don't add up to 1536, which I think should happen? So obviously one or both of them is wrong.

Can anyone tell what I'm doing wrong? Thanks for your help.

Looks like you've got hierarchical data--referring to all the `medcrtr.forest_node` references--but you're not using the Oracle hierarchical syntax `CONNECT BY`... — OMG Ponies, Aug 19 '10 at 17:53
Could you clarify what you mean by "single them out"? Do you want to only see the rows that match & are attributed to both the `fnppp.id` values (use `INTERSECT`), or only see unique rows (use `UNION`) — OMG Ponies, Aug 19 '10 at 17:55
Thanks for the response. INTERSECT is what I should have been using all along. I'm trying to implement the CONNECT BY syntax though, but I'm not sure how to start, do you have any tips on where to start? Also, what are the advantages of it as opposed to the way my queries are written, other than cleanness? — The_Denominater, Aug 19 '10 at 19:03
@Denominator: Unconnected to the rest of the question, but a RULE hint? Which version of Oracle are you using? — , Aug 20 '10 at 12:50
@Mark Bannister: Actually could you explain that? I actually inherited this query from someone else, and I'm not sure how the hint applies to the query. It seems to run just as fast without it, so I would like to remove it if possible. I'm on 9i at the moment. — The_Denominater, Aug 20 '10 at 14:51
@Denominator: Oracle originally used a form of query optimisation called rule-based optimisation - this uses a set of rules (based on things like indexes on tables, order of tables in query, etc.) to optimise the query plan. Starting in (I think) Oracle 8, Oracle introduced a Cost-based optimiser, which uses DB statistics to optimise the query plan. The rule hint tells Oracle to use rule-based optimisation - normally, cost-based optimisation should be more efficient, but under some circumstances (eg. if the DB statistics are out of date) this is not always true. — , Aug 20 '10 at 15:23
@Denominator: (continued) Rule-based optimisation is not supported (read: completely removed) in Oracle 10g onwards - a query that relies on the rule hint for reasonable performance will probably perform extremely badly in Oracle 10 onwards. So in short: I strongly recommend removing this hint from the query, *especially* if it performs as well without it. If you have any plans to move to Oracle 10+, it would be a good idea to check whether there are any other rule-hinted queries being used. — , Aug 20 '10 at 15:29
See also this question: http://stackoverflow.com/questions/493492/oracle-10-optimizer-from-rule-to-cost-why. — , Aug 20 '10 at 15:31
@ Mark Bannister: Thanks for the advice, I'm going to take it out of the query since it doesn't seem to slow it down. — The_Denominater, Aug 20 '10 at 18:58

score 1 · Accepted Answer · answered Aug 19 '10 at 17:45

1

If you are truly looking for records that match between two result sets, check out information about the SQL INTERSECT command.

Looking at your query, though, I'm not sure that's what you want. If one record has fnppp.id = 1169, then how can the same record have fnppp.id = 1715?

Are you possibly looking for something that's more like SQL UNION?

answered Aug 19 '10 at 17:45

Dave McClelland

3,385
1
29
44

Thanks, I don't know how I forgot about INTERSECT, that's what I needed. – The_Denominater Aug 19 '10 at 19:01

Single out duplicates between two result sets

1 Answers1

Linked