2

I am using EclipseLink 2.6.4, Java 8 and DB2 for database. I have to write query to read data with NOT IN statement providing a large set of values for NOT IN statement.

Basically I have set of external IDs with over 10000 values:

Set<Integer> externalIDs = new HashSet<>(Arrays.asList("ExternalID1", "ExternalID2", "ExternalID3",....)); //externalIDs.size() == 10k+

Note: I know that the limit for NOT IN statement with DB2 is 1000 values, so I am creating partioned NOT IN statements in the query, and it looks like this:

public List<UserEntity> findNotReferencedToRemove2(Set<String> externalIds) {

  CriteriaBuilder cb = entityManager.getCriteriaBuilder();
  CriteriaQuery<UserEntity> cq = cb.createQuery(UserEntity.class);
  Root<UserEntity> root = cq.from(UserEntity.class);

  Path<String> externalId1 = root.get(UserEntity_.relation1).get(RelationEntity1_.externalId);
  Path<String> externalId2 = root.get(UserEntity_.relation2).get(RelationEntity2_.externalId);
  Path<String> externalId3 = root.get(UserEntity_.relation3).get(RelationEntity3_.externalId);
  Path<String> externalId4 = root.get(UserEntity_.relation4).get(RelationEntity4_.externalId);
  Path<String> externalId5 = root.get(UserEntity_.relation5).get(RelationEntity5_.externalId);

  Predicate predicate = cb.and(
      partitionedNotIn(cb, externalId1, externalIds),
      partitionedNotIn(cb, externalId2, externalIds),
      partitionedNotIn(cb, externalId3, externalIds),
      partitionedNotIn(cb, externalId4, externalIds),
      partitionedNotIn(cb, externalId5, externalIds)
  );

  return entityManager.createQuery(cq.where(predicate)).getResultList();
}

//creates NOT IN statement splited in chunks of 999 values connected with AND 
private<C> Predicate partitionedNotIn(CriteriaBuilder cb, Path<C> path, Collection<C> ids) {
    if (ids.isEmpty()) {
      return cb.and();
    }
    return cb.and(partition(ids).stream().map(path::in).map(cb::not).toArray(Predicate[]::new));
  }

  private <C> Collection<List<C>> partition(Collection<C> list) {
    final AtomicInteger counter = new AtomicInteger(0);
    return list.stream()
        .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / 999))
        .values();
  }

Private method partionedNotIn() just splits NOT IN statements in chunks of 999 values to not reach maximum of 1000 values.

But as you can see I have 5 NOT IN statements and for each 10000 values provided and in total is 50000 and I am hitting limit in DB of hosted variable length.

Anyway, the goal is to split this in chunks so I don't have 50k+ values in total for NOT IN statements and currently don't have any idea how to achieve this. With just IN statement it would be easy.

Any suggestion will be helpful. Thank you.

carpics
  • 2,272
  • 4
  • 28
  • 56

1 Answers1

1

Where are you getting the values for the not in? Is it a file, then you might be able to use an external table. Or create a temporary table and insert the values into that table and use it in your query.

Steve
  • 121
  • 5