0

This feels like a trivial use case for Hibernate or JPA, but I've been struggling for a couple of days to get this to work.

I have an position entity class that has latitude, longitude and updateTime fields (among others). I would like to count the number of distinct combinations of those three fields while ignoring the others. In SQL, this is trivial:

SELECT COUNT(*) FROM (SELECT DISTINCT LONGITUDE, LATITUDE, UPDATE_TIME FROM POSITION) AS TEMP;

It is important that I abstract myh database implementation from the rest of my application because different users may wish to use different database engines. (Heck I use h2 for testing and mariadb for local production...)

I have been trying to translate this SQL into Java code using either Hibernate or JPA syntax, but I cannot figure out how.

EDIT - Here is as close as I have been able to get using JPA (ref: https://en.wikibooks.org/wiki/Java_Persistence/Criteria)

public long getCountDistinctInFlightPositions() {
    Session session = sessionFactory.openSession();

    CriteriaBuilder criteriaBuilder = session.getCriteriaBuilder();

    CriteriaQuery<Tuple> innerQuery = criteriaBuilder.createTupleQuery();
    Root<Position> position = innerQuery.from(Position.class);
    innerQuery.multiselect(
        position.get("longitude"),
        position.get("latitude"),
        position.get("updateTime")
    );

    // The method countDistinct(Expression<?>) in the type CriteriaBuilder is not applicable for the arguments (CriteriaQuery<Tuple>)

    criteriaBuilder.countDistinct(innerQuery);

    return 1;
}
spierepf
  • 2,774
  • 2
  • 30
  • 52
  • That does not look like standard sql... shouldn't it be like `select count(distinct longitute, latitude, update_time_from_position) from some_table`? Anyway you probably want to use [`CriteriaBuilder.countDistinct`](https://docs.jboss.org/hibernate/jpa/2.1/api/javax/persistence/criteria/CriteriaBuilder.html#countDistinct(javax.persistence.criteria.Expression)) you have to build an `Expression` that evaluates to those three fields and you should obtain what you want. – Giacomo Alzetta May 18 '18 at 14:46
  • As Giacomo already stated your query should look like `select count(distinct properties) from Entity` (or build that via criteria but I didn't use those often so far). It might not work for multiple properties but you should have a few options: 1) load the rows and count the returned rows in your code (not optimal but at least it should work) 2) use a trick like concatenating the properties before counting, e.g. like suggested here: https://stackoverflow.com/questions/9184135/how-to-countdistinct-on-multiple-columns – Thomas May 18 '18 at 14:53
  • For kicks, I tried using `SELECT COUNT(DISTINCT LONGITUDE, LATITUDE, UPDATE_TIME ) FROM POSITION;` For my troubles, I got: `Syntax error in SQL statement "SELECT COUNT(DISTINCT LONGITUDE,[*] LATITUDE, UPDATE_TIME ) FROM POSITION"; expected "., (, [, ::, *, /, %, +, -, ||, ~, !~, NOT, LIKE, ILIKE, REGEXP, IS, IN, BETWEEN, AND, OR, )"; SQL statement: SELECT COUNT(DISTINCT LONGITUDE, LATITUDE, UPDATE_TIME ) FROM POSITION [42001-196] 42001/42001 (Help)` – spierepf May 18 '18 at 15:01
  • @Thomas Thanks, I tried that already. It also did not work: https://stackoverflow.com/questions/50411647/formula-field-is-not-recognized-by-client-code – spierepf May 18 '18 at 15:15
  • And counting rows in Java land is a non-starter, there will be literally millions and I don't want to move them over the wire just so I can count them. That is what databases are for. – spierepf May 18 '18 at 15:18
  • There's a 3rd alternative: use plain SQL if you can. The databases should support standard SQL anyways, so just use a native query (i.e. `EntityManager.createNativeQuery(sql)`) that looks like what Giacomo depicted. It's not ideal but should do the trick. – Thomas May 18 '18 at 15:45

1 Answers1

1

You can do it this way:

CriteriaQuery<Long> countQuery = cb.createQuery( Long.class );
Root<Position> root = countQuery.from( Position.class );

countQuery.select( cb.count( root.get( "id" ) ) );

Subquery<Integer> subQuery = countQuery.subquery( Integer.class );
Root<Position> subRoot = subQuery.from( Position.class );
subQuery.select( cb.min( subRoot.get( "id" ) ) );
subQuery.groupBy( subRoot.get( "longitude" ), 
  subRoot.get( "latitude" ), 
  subRoot.get( "updateTime" ) );

countQuery.where( root.get( "id" ).in( subQuery ) );

Long count = entityManager.createQuery( countQuery ).getSingleResult();

This effectively generates the following SQL:

SELECT COUNT( p0.id ) FROM Position p0
 WHERE p0.id IN (
   SELECT MIN( p1.id )
     FROM Position p1
    GROUP BY p1.longitude, p1.latitude, p1.updateTime )

In a scenario where I have 3 rows and 2 of them have the same tuple of longitude, latitude, and update time, the query will return a result of 2.

Make sure you maintain a good index on [Longtitude, Latitude, UpdateTime] here so that you can take advantage of faster GROUP BY execution. The PK is already b-tree indexed so the other operations wrt COUNT/MIN should be accounted for easily by that index already.

Naros
  • 19,928
  • 3
  • 41
  • 71