I am using EclipseLink 2.6.4
, Java 8
and DB2
for database. I have to write query to read data with NOT IN
statement providing a large set of values for NOT IN
statement.
Basically I have set of external IDs with over 10000 values:
Set<Integer> externalIDs = new HashSet<>(Arrays.asList("ExternalID1", "ExternalID2", "ExternalID3",....)); //externalIDs.size() == 10k+
Note: I know that the limit for NOT IN
statement with DB2 is 1000 values, so I am creating partioned NOT IN
statements in the query, and it looks like this:
public List<UserEntity> findNotReferencedToRemove2(Set<String> externalIds) {
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<UserEntity> cq = cb.createQuery(UserEntity.class);
Root<UserEntity> root = cq.from(UserEntity.class);
Path<String> externalId1 = root.get(UserEntity_.relation1).get(RelationEntity1_.externalId);
Path<String> externalId2 = root.get(UserEntity_.relation2).get(RelationEntity2_.externalId);
Path<String> externalId3 = root.get(UserEntity_.relation3).get(RelationEntity3_.externalId);
Path<String> externalId4 = root.get(UserEntity_.relation4).get(RelationEntity4_.externalId);
Path<String> externalId5 = root.get(UserEntity_.relation5).get(RelationEntity5_.externalId);
Predicate predicate = cb.and(
partitionedNotIn(cb, externalId1, externalIds),
partitionedNotIn(cb, externalId2, externalIds),
partitionedNotIn(cb, externalId3, externalIds),
partitionedNotIn(cb, externalId4, externalIds),
partitionedNotIn(cb, externalId5, externalIds)
);
return entityManager.createQuery(cq.where(predicate)).getResultList();
}
//creates NOT IN statement splited in chunks of 999 values connected with AND
private<C> Predicate partitionedNotIn(CriteriaBuilder cb, Path<C> path, Collection<C> ids) {
if (ids.isEmpty()) {
return cb.and();
}
return cb.and(partition(ids).stream().map(path::in).map(cb::not).toArray(Predicate[]::new));
}
private <C> Collection<List<C>> partition(Collection<C> list) {
final AtomicInteger counter = new AtomicInteger(0);
return list.stream()
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / 999))
.values();
}
Private method partionedNotIn()
just splits NOT IN
statements in chunks of 999 values to not reach maximum of 1000 values.
But as you can see I have 5 NOT IN
statements and for each 10000 values provided and in total is 50000 and I am hitting limit in DB of hosted variable length.
Anyway, the goal is to split this in chunks so I don't have 50k+ values in total for NOT IN
statements and currently don't have any idea how to achieve this. With just IN
statement it would be easy.
Any suggestion will be helpful. Thank you.