10

Below is a question which supposedly was part of the official exam from Sun:

A Reader entity has a one-to-many, bidirectional relationship with a Book entity. Two Reader entities are persisted, each having two Book entities associated with them. For example, reader 1 has book a and book b, while reader 2 has book c and book d. Which query returns a Collection of fewer than four elements?
A. SELECT b.reader FROM Book b
B. SELECT r FROM Book b INNER JOIN b.reader r
C. SELECT r FROM Reader r INNER JOIN r.books b
D. SELECT r from Book b LEFT JOIN b.reader r LEFT JOIN FETCH r.books

Given answer is C, which I believe is incorrect. From what I understand, SQL with inner join of two tables will be generated by JPA provider. Therefore, in all cases we will get 4 records. I've run a test with one-to-many relation and duplicates were included.

Who is wrong, me or Sun?

Andriy Kharchuk
  • 1,165
  • 1
  • 13
  • 25
  • 3
    I ran the test with Hibernate, and there were duplicates as well. select distinct r can be used to remove duplicates. – JB Nizet Nov 20 '11 at 11:42
  • Thanks for checking. I would use SELECT r FROM Reader r WHERE r.books is EMPTY to get unique readers who have borrowed books... However, they claim C is the right answer. – Andriy Kharchuk Nov 21 '11 at 03:00
  • definitely, IS NOT EMPTY – Andriy Kharchuk Nov 21 '11 at 03:32
  • I have checked tutorial from Oracle and got confused even more: >SELECT DISTINCT p FROM Player p, IN(p.teams) t Data retrieved: All players who belong to a team. You may also use the JOIN statement to write the same query: SELECT DISTINCT p FROM Player p JOIN p.teams t This query could also be rewritten as: SELECT DISTINCT p FROM Player p WHERE p.team IS NOT EMPTY Why to use DISTINCT in the latter case? – Andriy Kharchuk Nov 21 '11 at 03:47
  • 1
    my tests also resulted in duplicates. Using EclipseLink 2.7.7 and MySQL 8.0.22 – Pedro Siqueira Dec 23 '20 at 20:40

2 Answers2

19

Answer from Mike Keith, EJB 3.0 co-specification lead:

There are a couple of statements related to duplicates in the spec.

  1. The JOIN FETCH is a variation of the JOIN, but it does state that similar JOIN semantics apply (except that more data is selected). The spec (section 4.4.5.3 of JPA v2.0) gives an example of duplicate Department rows being returned despite the fact that the Employee objects are not in the select clause.

  2. The more direct reference is in the SELECT section (section 4.8 of JPA v2.0), where it clearly states

"If DISTINCT is not specified, duplicate values are not eliminated."

Many JPA providers do in fact remove the duplicates for a few reasons:

a) Convenience of the users because some users are not knowledgable enough in SQL and are not expecting them b) There is not typically a use case for requiring dups c) They may be added to a result set and if object identity is maintained the dups get eliminated automatically

Andriy Kharchuk
  • 1,165
  • 1
  • 13
  • 25
  • Page 142: https://web.archive.org/web/20230718052714/https://download.oracle.com/otn-pub/jcp/persistence-2.0-fr-eval-oth-JSpec/persistence-2_0-final-spec.pdf?AuthParam=1689658154_8fc3fd7e39097534b6e9305a310227b0 – BeardOverflow Jul 18 '23 at 05:28
1

C is correct, joins to ToMany relationships should not return duplicates. The JPA provider should automatically use a distinct to filter these out. I believe this is what the spec requires, although it may be one of those less well defined areas of the spec.

If a join fetch is used, the I believe the spec actually requires the duplicates to be returned. Which is odd, can see why you would every want duplicates. If you put a distinct on a join fetch, then they will be filtered (in memory, as all rows need to be selected).

This is how EclipseLink works anyway.

All of the other cases select Books not readers, so get the duplicates, C selects Readers so should not get duplicates.

James
  • 17,965
  • 11
  • 91
  • 146
  • James, I have skimmed through specification and failed to find requirement to eliminate duplicates. However, in toplink (eclipse-link) docs I do see mentioning about removing duplicates: `Join reading can result in returning duplicate data if a one-to-many or a shared one-to-one relationship is joined. Although TopLink correctly filters the duplicate results from the object result, the duplicate data still must be fetched from the database and can degrade performance, especially if multiple one-to-many relationships are joined.` Could you please run similar query using JPA and share result. – Andriy Kharchuk Nov 21 '11 at 21:23