0

This may be a JPQL question or a SQL question, but since it came from a reading of a Criteria example in the JPA Spec. I'm tagging it JPA and Criteria API too. I think the problem is that I don't understand how SQL correlated subqueries work.

Let me begin with JPA Spec. section 6.5.12, Subqueries, Example 4: A Special Case, which says,

Note that joins involving the derived subquery root do not affect the join conditions of the containing query. The following two query definitions thus differ in semantics:

CriteriaQuery<Order> q = cb.createQuery(Order.class);
Root<Order> order = q.from(Order.class);
Subquery<Integer> sq = q.subquery(Integer.class);
Root<Order> orderSub = sq.correlate(order);
Join<Order,Customer> customer = orderSub.join(Order_.customer);
Join<Customer,Account> account = customer.join(Customer_.accounts);
sq.select(account.get(Account_.balance));
q.where(cb.lt(cb.literal(10000), cb.all(sq)));

and

CriteriaQuery<Order> q = cb.createQuery(Order.class);
Root<Order> order = q.from(Order.class);
Join<Order,Customer> customer = order.join(Order_.customer);
Subquery<Integer> sq = q.subquery(Integer.class);
Join<Order,Customer> customerSub = sq.correlate(customer);
Join<Customer,Account> account = customerSub.join(Customer_.accounts);
sq.select(account.get(Account_.balance));
q.where(cb.lt(cb.literal(10000), cb.all(sq)));

The first of these queries will return orders that are not associated with customers, whereas the second will not. The corresponding Java Persistence query language queries are the following:

SELECT o
FROM Order o
WHERE 10000 < ALL (
    SELECT a.balance
    FROM o.customer c JOIN c.accounts a)

and

SELECT o
FROM Order o JOIN o.customer c
WHERE 10000 < ALL (
    SELECT a.balance
    FROM c.accounts a)

The second query seems straightforward but the first is difficult to visualize.

My understanding of a correlated subquery is that it depends on the outer query; it uses the data of the outer query. It is not independent of the outer query. It executes once for each selected row from the outer query.

In both queries Orders are being selected in the outer query. In the first query, for each selected Order from the outer query the following subquery is run:

SELECT a.balance FROM o.customer c JOIN c.accounts a

Intuition tells me that the customer associated with the Order record is joined with account. Hence the subquery will return the balances of all accounts associated with the particular customer who is associated with the current Order record.

Am I right or wrong? It seems I'm wrong, according to the above example, which, again, says,

"Note that joins involving the derived subquery root do not affect the join conditions of the containing query," and, "The first of these queries will return Orders that are not associated with customers..."

So the subquery does not return account balances of the customer associated with the Order? What account balances are instead returned? What's the right way to see this? My head is spinning.

Patrick Garner
  • 3,201
  • 6
  • 39
  • 58

1 Answers1

0

First Query:

SELECT o
FROM Order o
WHERE 10000 < ALL (
    SELECT a.balance
    FROM o.customer c JOIN c.accounts a)
  1. The outer query is run first. The outer query selects all orders, including orders that are not associated with a customer (e.g. orders fulfilled where buyer chose not to register as a customer).
  2. Next, the subquery is run for each record selected in the outer query.

    • If the order has a customer, the subquery returns the balances of all accounts associated with the customer. If all balances are < 10000 the ALL condition evaluates to TRUE and the row of the outer query is included in the result set.

    • If a subquery returns zero rows (which would happen in this example if if there is no customer associated with the order) the ALL condition evaluates to TRUE and the row is included in the result.

The orders in the first query are not associated with customers and therefore the result set will include both orders that have customers and orders that were processed for people who decided not to register as customers.

Second Query:

SELECT o
    FROM Order o JOIN o.customer c
    WHERE 10000 < ALL (
        SELECT a.balance
        FROM c.accounts a)
  1. The outer query is run first. The outer query selects only orders that are associated with customers.

  2. Next, the subquery is run for each row.

    • If the customer associated with the order has one or more accounts and all of the account balances are< 10000 the ALL condition will evaluate to true and the row in the outer query is included in the result set.

    • If the customer associated with the order has no accounts the subquery returns zero rows the ALL condition evaluates to true and the row in the outer query is included in the result set.

The orders in the second query are associated with customers and therefore the result set will not include orders that were processed for people who decided not to register as customers.

Patrick Garner
  • 3,201
  • 6
  • 39
  • 58