0

I have 3 entities. Customer, Process and Document.

A Customer has many processes and a process has many documents.

I want to sort customers by document's updateDate.

My entities are like below;

Customer-

@Entity
public class Customer {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;

    @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
    private List<Process> processes = new ArrayList<>();

    // getter, setter etc.

}

Process-

@Entity
public class Process {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String type;

    @ManyToOne(fetch = FetchType.LAZY)
    private Customer customer;

    @OneToMany(mappedBy = "process", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
    private List<Document> documents = new ArrayList<>();

    //getter, setter etc.

}

Document-

@Entity
public class Document {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String note;

    private LocalDateTime updateDate;

    @ManyToOne(fetch = FetchType.LAZY)
    private Process process;

}

I have tried the following specification-

  public static Specification<Customer> orderByDocumentUploadDate() {
        return (root, query, criteriaBuilder) -> {
            ListJoin<Customer, Process> processJoin = root.join(Customer_.processes);
            ListJoin<Process, Document> documentJoin = processJoin.join(Process_.documents);

            query.orderBy(criteriaBuilder.desc(documentJoin.get(Document_.updateDate)));
            query.distinct(true);
            return null;
        };
    }

It gives following error-

ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

Generated SQL-

    select distinct customer0_.id   as id1_0_,
                customer0_.name as name2_0_
from customer customer0_
         inner join
     process processes1_ on customer0_.id = processes1_.customer_id
         inner join
     document documents2_ on processes1_.id = documents2_.process_id
order by documents2_.update_date desc
limit ?

I have also tried by grouping, like below-

    public static Specification<Customer> orderByDocumentUploadDate() {
    return (root, query, criteriaBuilder) -> {
        ListJoin<Customer, Process> processJoin = root.join(Customer_.processes);
        ListJoin<Process, Document> documentJoin = processJoin.join(Process_.documents);

        query.orderBy(criteriaBuilder.desc(documentJoin.get(Document_.updateDate)));
        query.groupBy(root.get(Customer_.id));
        return null;
    };
}

Then it gave a different error-

ERROR: column "documents2_.update_date" must appear in the GROUP BY clause or be used in an aggregate function

Generated SQL-

    select
    customer0_.id as id1_0_,
    customer0_.name as name2_0_ 
from
    customer customer0_ 
inner join
    process processes1_ 
        on customer0_.id=processes1_.customer_id 
inner join
    document documents2_ 
        on processes1_.id=documents2_.process_id 
group by
    customer0_.id 
order by
    documents2_.update_date desc limit ?

I could do it by the following sql; max() solved it in below sql-

    select  customer.* from customer
inner join process p on customer.id = p.customer_id
inner join document d on p.id = d.process_id
group by customer.id
order by max(d.update_date);

But I can't do the same, using the criteria API.

Do you have any suggestion?

Munim
  • 2,626
  • 1
  • 19
  • 28
Sam
  • 105
  • 2
  • 9

1 Answers1

2

This is a conceptual misunderstanding.

  • First, you have to understand how does inner join works. And this portion is okay in this case: [join process table with document table based on document.process_id = process.id]

  • Second, you need to sort customers based on the document's update date

Unfortunately, you used group by here. GROUP BY only returns column in which it is grouped by. In this case, it will return only customer_id.

You can use aggregate functions like count(), sum() etc. on grouped data.

When you tried to access update_date, it will throw below error:

ERROR: column "documents2_.update_date" must appear in the GROUP BY clause or be used in an aggregate function

Now, how can we get rid of this?

  • So first we need to do join to get customer id. After getting customer id, we should group the data by the customer id and then use max() to get max_date of each group(if necessary then minimum)
SELECT 
   customer_id,
   max(date) AS max_date
FROM    
   document 
   JOIN process ON process.id = document.process_id
GROUP BY customer_id

It will return a temporary table, that looks something like below:

customer_id max_date
1 2020-10-24
2 2021-03-15
3 2020-09-24
4 2020-03-15

Using the temporary table, you can now sort customer_id by date

SELECT
    customer_id,
    max_date
FROM    
    (SELECT 
        customer_id,
        max(date) AS max_date
    FROM    
        document 
        JOIN process ON process.id = document.process_id
    GROUP BY customer_id) AS pd
ORDER BY max_date DESC

Hope this helps.

Munim
  • 2,626
  • 1
  • 19
  • 28
Md Kawser Habib
  • 1,966
  • 2
  • 10
  • 25